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A NEW SENSITIVE METHOD FOR QUANTIFYING 
ACTIVE TRANSFORMING GROWTH FACTOR- BETA 
AND COMPOSITIONS THEREFOR 

5 Technical Field 

The present invention relates to a sensitive assay method 
for quantifying the amount of active transforming growth factor 
beta (TGF-fi) and vector compositions for use therein for 
expressing an indicator molecule in response to TGF-E 
10 activation of a TGF-E response element in the vector. 

gackqrpund 

Transforming growth factor beta, hereinafter referred to 
as TGF-&, is a 25 kilodalton (kD) homodimeric protein that 

15 belongs to a family of regulators of cell growth and 

differentiation that includes activins, inhibins, Mullerian 
inhibiting substance, the Drosophila decapentaplegic complex 
and bone morphogenic proteins. For review, see, Massague, Ann. 
Rev. Cell Biol . , 6:597-641 (1990); Roberts et al . , In Peptide 

20 Growth Factors and Their Receptors, Sporn et al . , Eds,., 

Springer-Verlag, Berlin, 1:419-472 (1990); and Hoffman, Cwrr , 
Onin. Cell Biol. , 3:947-952 (1991). TGF-fi was initially 
defined by its ability to induce morphological transformation 
of fibroblastic cells in monolayer culture and stimulation of 

25 colony formation in soft agar. Delarco et al . , Proc i NStlt 

Acad. Sci.. USA . 75:4001-4005 (1978) and Todaro et al . , PrOC • 
Natl , Acad. Sci . , USA , 77:5258-5262 (1980). 

Three distinct molecular isoforms of TGF-fi, the genes of 
which are located on different chromosomes, have been 

30 identified in mammals and are designated TGF-Sl, TGF-S2 and 

TGF-S3. Derynck et al . , Nature , 316:701-705 (1985); Hanks et 
al., Proc. Natl. Arad. Sci.. USA , 85:71-72 (1988); and Madisen 
et al., PNA , 7:1-8 (1988). Each of the isoforms are first 
synthesized as high molecular weight latent or inactive 

35 precursor polypeptides that are then processed to 12.5 kD 
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monomers. Activation of the latent complex can occur through a 
variety of physiochemical or enzymatic treatments as well as in 
various tissue culture systems. For review, see Barnard et 
al., Biochim . Bionhvs . Acta , . 1032:79-87 (1990). Two processed 
5 monomers then dimerize to form biologically active TGF-S. 

The activation process must occur to allow binding of the 
dimerized TGF-& to the high affinity TGF-S receptors expressed 
on the surfaces of all normal cells and most all neoplastic 
cells. Tucker et al., Proc . Nat l. Acad. Sci . , USA . 81:6757- 
10 6761 (1984); Frolik et al . , J. Biol , Chem. , 259:10995-11000 

(1984); Pircher et al. ( Biochem, Eippfrys , Res . CPfflmur) , , 136:30- 
37 (1986) . 

Although some TGF-S activation systems generate the mature 
TGF-£ in nanogram quantities, the majority liberate picogram 

15 amounts. These low concentrations, however, are sufficient to 
induce a variety of biological responses such as macrophage 
chemotaxis (Wahl et al., Proc . Natl. Acad. Sci.. USA, 84:5788- 
5792 (1987)), inhibition of endothelial cell migration and 
proliferation (Heimark et al . , Science , 233:1078-1080 (1986)), 

20 stimulation of extracellular matrix deposition (Ignotz et al . , 
J. Biol . Chem. , 261:4337-4345 (1986)) and decreased plasminogen 
activator (PA) activity as a result of decreased PA production 
(Laiho et al . , J. Cell - Bio3 - 103:2403-2410 (1986) and 
Flaumenhaft et al . , J. Cell. Phvsiol. , 152:48-55 (1992)) along 

25 with increased secretion of its inhibitor, plasminogen 

activator inhibitor-1 (PAI-1) (Laiho et al., J . Biol , Chem. , 
262 : 17467-17474 (1987) ) . 

PAI-1 is the primary inhibitor of both tissue-type 
plasminogen activator (t-PA) and urokinase -type plasminogen 

30 activator (u-PA) , and as such is a potent anti-f ibrinolytic 
molecule. PAI-1 synthesis by cultured cells in vitro is 
induced by a variety of molecules including cytokines, growth 
factors, hormones, and other agents such as endotoxin and 
phorbol myristate acetate. Nuclear transcription run-on assays 

35 demonstrate that the regulation of PAI-1 by many of these 
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agents, including TGF-&, occurs primarily at the level of 
transcription . 

TGF-S released from platelets may be an important negative 
regulator of the fibrinolytic system of the vessel wall since 
5 the TGF-6 in releasates of thrombin-activated platelets causes 
large increases in PAI-1 synthesis by endothelial cells. This 
increased PAI-1 synthesis may account for the resistance of 
platelet-rich thrombi to thrombolytic therapy. The 
accumulation of PAI-1 in the extracellular matrix in response 
10 to TGF-fi protects matrix proteins from proteolytic degradation. 
Thus, the induction of PAI-1 by TGF-fi may also play a role in 
both wound healing and fibrotic responses. 

These and other biological effects of TGF-S activity have 
been used to develop a variety of semiquantitative and 
15 • quantitative bioassays including those based on chondrogenesis , 
inhibition of DNA synthesis and cell growth, differentiation, 
migration or PA activity. Differentiation-based assays include 
the induction of cartilage specific proteoglycan expression 
(ED 50 = 5 ng/ml; 200 pM) (Ogawa et al . , in Peptide Growth 
20 Factors, Barnes et al . , Eds, Academic Press Inc., 198:317-327 
(1991); Seyedin et al . , P-ror . Natl. Acad. Pci ■ . USA, 82:2267- 
2271 (1985)) and inhibition of rat L6 myoblast differentiation 
(ED 50 =0.2 ng/ml; 8 pM) (Florini et al . , J , BiflJ . Chem. , 
261:16509-16513 (1986)). An ED50 represents the half-maximal 
25 amount of factor required to produce an effect, activation or 
inhibition, on differentiation of target cells. The 
abbreviations ng/ml, pg/ml, nM and pM respectively stand for 
nanograms /mi Hi liter, picograms/milliliter, nanomolar and 
picomolar. These assays are utilized primarily for studying 
30 differentiation rather than for quantification of TGF-£. 

Assays based on TGF-S ' s ability to inhibit DNA synthesis 
and cell growth in mink lung epithelial cells (MLE cells) (ED 50 
= 10-20 pg/ml; 0.4-0.8 pM) (Lucas et al., In Peptide Growth 
Factors, Barnes et al., Eds, Academic Press Inc. 198:303-316 
35 (1991) and Danielpour et al . , J. Cell- Physiol,. 138:79-86 
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(1989)), African green monkey kidney epithelial cells (ED50 = 1 
ng/xnl; 40 pM) (Holley et al., Pror. Natl. Acad. Sci.. USA. 
77:5989-5992 (1980)), rat hepatocytes (ED50 = 0.4 ng/ml;16 pM) 
(Nakamura et al . , Biochem. Bioohvs . Res. Comm. . 133:1042-1050 
5 (1985)), and fetal bovine heart endothelial cells (ED50 = 75- 
125 pg/ml; 3-5 pM) (Qian et al., PrPC. Natl. Acad, Sci , . USA, 
89:6290-6294 (1992)) are sensitive but can be affected by a 
variety of molecules such as insulin, EGF, PDGF, and bFGF . 

Migration and plasminogen activator (PA) activity assays 

10 have also been described. The migration of bovine aortic 

endothelial cells (BAEs) into a denuded area of a monolayer is 
inhibited by TGF-S {ED50 - 2 fig /ml; 80 pM: sensitivity 10-2 0 
pg/ml; 0.4-0.8 pM) (Sato et al., J. Cell Biol . , 107:1199-1205 
(1988); Sato et al . , J. Cell Biol. , 109:309-315 (1989); and 

15 Sato et al . , J. Cell Biol. . 111:757-763 (1990). Migration of 

BAEs, however, can be simultaneously stimulated by endogenously 
or exogenously supplied bFGF that can abrogate TGF-S's 
inhibitory effect (Sato et al . , J. Cell Biol . . 107:1199-1205 
(1988)) . The PA assay for measurement of TGF-E concentration 

20 is very sensitive and rapid (Flaumenhaft et al . , J. Cell. 

Phvsiol . , 152:48-55 (1992)). The assay is based on the ability 
of TGF-fi to decrease PA activity of BAEs by inhibiting PA 
synthesis and secretion and by inducing expression of its 
inhibitor, PAI-1. This assay, however, is also sensitive to 

25 other molecules, such as bFGF, that can alter PA activity 

(Flaumenhaft et al . , J. Cell. Phvsiol. . 152:48-55 (1992) and 
Sato et al., J. Cell Biol . . 107:1199-1205 (1988)). The ED50 of 
the assay varies from 1 to 35 pg/ml (0.04-1.4 pM) of TGF-S 
depending on differences in basal PA levels and sensitivity to 

30 TGF-& among primary BAE cultures. 

The ability of TGF-E to stimulate PAI-1 expression has 
recently been used to study TGF-E receptors. Wrana et al . , 
Cell, 71:1003-1014 (1992) transiently transfected a PAI-1 
luciferase construct together with a human type II TGF-S 

35 receptor expression vector into TGF-S resistant MLE cells. 
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This lucif erase construct contained a short, synthetic TGF-S 
response element based on the human PAI-1 promoter and was used 
to report functional expression of the receptor. Although only 
used to screen transfected mutant cell lines, this construct 
5 appeared to be less sensitive to TGF-S than the constructs . of 

this invention when transiently transfected into MLE cells, and 
no information was reported regarding its dose-responsiveness 
or specificity. 

In another study of the TGF-£-stimulation of PAI-1 

10 expression, Riccio et al. # Mol . Cell . Pi-Pi . . 12:1846-1855 

(1992), transiently transfected TGF-S responsive cells with 
constructs containing varying regions of the 5' -flanking domain 
of the human PAI-1 gene to determine the transcription 
regulatory mechanism used by TGF-S, All the constructs 

15 contained the gene encoding the enzyme chloramphenicol 

acetyltransferase to provide for an indirect determination of 
the transcriptional effect of the various constructs. With 
this approach, a 67 base pair region that contained binding 
sites for the two proteins, CCAAT-binding transcription factor - 

20 nuclear family I family and USF factor. Both sites were 

necessary to obtain TGF-S induction. The constructs, however, 
were not utilized in assays to determine dose-responsiveness 
nor measure the amount of TGF-S in a sample. 

The most specific assays for TGF-S are the radioreceptor, 

25 radioimmunoassay (RIA) , and enzyme -linked immunosorbent assay 
(ELISA) . Radioreceptor assays using a variety of cell types, 
such as A549 human lung carcinomas and murine AKR-213, have 
been described and have ranges of 125 pM/ml to 25 ng/ml (5 pM-1 
nM) with ED50 of approximately 0 . 5 ng/ml (20 pM) . See, 

30 Wakefield et al., »L Cell. Biol. , 105:965-975 (1987); Sato et 
al., sL Cell Biol. . 111:757-763 (1990); Lucas et al . , In- 
Peptide Growth Factors, Barnes et al . , Eds, Academic Press Inc. 
198:303-316 (1991) and 0 1 Connor-McCourt et al., J , Biol , Chern. , 
262:14090-14099 (1987). RIAs specific for TGF-S1 and £2 have 

35 ED50s of 12 and 37 pM, respectively (Danielpour et al., J . Cell 
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Phvsiol , . 138:79-86 (1989)). Others, using different 
antibodies, describe the range of TGF-E1 specific RIAs to be 
6.25-200 ng/ml (0.25-8 nM) , with a sensitivity of 2.4 ng/ml 
(0.1 nM) (Lucas et al . , In Peptide Growth Factors, Barnes et 
5 al., Eds, Academic Press Inc. 198:303-316 (1991)). As 
demonstrated by the differences in these results, the 
affinities of the antibodies can greatly alter the sensitivity 
of the assay. 

Isoform-specif ic double antibody or sandwich ELISAs 

10 (SELISA) are also very sensitive to the affinities of the 

antibodies. One such assay, using two different monoclonal 
antibodies specific for TGF-fil, had a useful range of 0.63 to 
40 ng/ml (0.025-16 nM) (Lucas et al . , In Peptide Growth 
Factors, Barnes et al., Eds, Academic Press Inc. 198:303-316 

15 (1991)). Using a combination of isoform-specif ic turkey and* 

rabbit antibodies, Danielpour et al . , J. Cell Phvsiol . , 138:79- 
86 (1989) created a SELISA with detection limits of 2-5 pg/well 
(20-50 pg/ml; 0 8-2 pM) . Although highly sensitive and 
specific, SELISAs such as these are not readily available and 

20 are expensive. 

Although all of these other TGF-£ assays can detect mature 
TGF-S, the low concentrations (<2 pM) generated in various 
biological systems make many of them impractical without prior 
concentration of the sample. This can result in large losses 

25 of the mature growth factor or more importantly activation of 
latent TGF-S. Moreover, many of the assays are complicated to 
establish and can be influenced by other factors present in the 
samples thus reducing their utility for accurating measuring 
the amount of TGF-S in the sample. For this reason, a need 

30 exists for a relatively simple, sensitive and nonconf ounding 
assay for TGF-fi. 



Brief Description of the Invention 

A highly sensitive and specific, non-radioactive assay, 
35 for mature (active) TGF-S has now been developed. When 
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compared to the sensitive and widely used proliferation-based 
MLEC method for measuring TGF-fi concentration, the TGF-S assay 
method of this invention is more rapid, has comparable 
sensitivity, and has a greater detection range. Specif icxty of 
5 this novel assay was also higher as evidenced by its relative 

insensitivity to factors such as EGF and bFGF which can greatly 
affect other assays. The use of a truncated PAI-1 promoter 
that does not respond to other growth modulators such as PDGF 
found in biological samples, the method of this invention can 
10 be used in conditions where other bioassays are difficult to 
interpret. Because of its large range and specificity, the 
rapid, sensitive, non-radioactive, easily performed assay 
method of this invention is useful in determining active TGF-fi 
concentrations in complex solutions. 
15 Thus, the present invention overcomes the limitations of 

existing methods used to quantify the amount of TGF-6 in a 
liquid sample. This invention contemplates a method for 
quantifying the amount of TGF-E in a sample using a system 
comprising a TGF-S responsive cell containing an expression 
20 vector having a regulatory region comprising a TGF-fi response 
element opera tively linked to a promoter and having a 
structural region encoding an indicator molecule. Following 
TGF-S induced activation of the TGF-S response element, 
transcription results in the expression of an indicator 
25 molecule, the amount of which allows for the measurement of the 
amount of TGF-E responsible for the induced activation. 

In particular, in one embodiment of the invention 
contemplates a method for quantifying the amount of TGF-S in a 
liquid sample, which method comprises: 
30 (a) incubating the liquid sample together with eucaryotac 

cells that contain a TGF-fe responsive expression vector having 
a gene encoding luciferase for a predetermined time period 
sufficient for the eucaryotic cells to express a detectable 
amount of the luciferase; 
35 (b ) measuring the amount of the luciferase expressed 
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during the time period; and 

(c) determining the amount of TGF-fi present in the sample 
by comparing the measured amount of the lucif erase against a 
reference curve. 

The invention further contemplates that the reference 
curve represents a quantitative relationship derived from a 
series of measured amounts of luciferase produced from a series 
of known concentrations of TGF-E. 

Another embodiment of the invention contemplates a method 
for quantifying the amount of transforming growth factor-£ 
(TGF-E) in a liquid sample comprising: 

(a) providing, in eucaryotic cells capable of expressing 
an indicator molecule, a plasmid comprising, in the direction 
of transcription, a regulatory region that includes at least 
one TGF-S inducible response element that is operatively linked 
to a promoter, and a structural region downstream of the 
promoter, where the response element is capable of inducing 
dose-dependent indicator molecule activity and where the 
structural region codes for the indicator molecule; 

(b) incubating the liquid sample with the eucaryotic 
cells for a predetermined time period sufficient for the 
eucaryotic cells to express a detectable amount of the 
indicator molecule; 

(c) measuring the amount of the indicator molecule 
expressed during the time period; and 

(d) comparing the measured amount of the indicator 
molecule produced in step (c) with the amount of indicator 
molecule produced in a control assay performed according to 
steps (a) through (c) by treating the liquid sample with an 
anti-TGF-fi antibody to obtain a net measured amount of the 
indicator molecule induced by TGF-B. 

Contemplated for use with the methods of this invention 
are plasmids having identifying characteristics of plasmids on 
deposit with ATCC having the ATCC Accession Numbers 75627, 
75628 and 75629. Also contemplated are stably transformed 
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eucaryotic cells that contain the TGF-S response element having 
the nucleotide sequence in SEQ ID NO 11 where the cells 
correspond to cells on deposit with ATCC having the ATCC 
Accession Nuinber CRL 11508. 

The invention describes plasmids for use in the methods 
that comprise a nucleotide sequence corresponding to nucleotide 
sequences listed in SEQ ID NOs 1-10. TGF-S inducible response 
elements that comprise a nucleotide sequence corresponding to 
nucleotide sequences listed in SEQ ID NOs 11-17 are also 
described. Contemplated promoter nucleotide sequences are 
listed in SEQ ID NOs 18 and 19. 

A further embodiment of the methods of the invention are 
eucaryotic cells that are stably transformed cells containing a 
plasmid having a gene encoding a selectable marker for the 
15 selection of said stably transformed cells. The invention 

describes such plasmids having nucleotide sequences listed in 
SEQ ID NOs 1-6. The invention further describes a stably 
transformed eucaryotic cell on deposit with ATCC having ATCC 
Accession Number CRL 11508 containing the TGF-fi response 
20 element having the nucleotide sequence in SEQ ID NO 11. 

An additional embodiment are eucaryotic cells that are 
transiently transformed cells with plasmids corresponding to 
the nucleotide sequences listed in SEQ ID NOs 7-10. 

The invention describes quantifying the amount of TGF-S in 
25 a body fluid, in culture medium, and in a tissue extract. A 

further preferred embodiment is the determination of the amount 
of a specific isoform of TGF-fi, specifically TGF-fil, TGF-E2 or 
TGF-S3, in a liquid sample. 

In a preferred embodiment, this invention describes the 
30 use of mammalian cells. Preferred mammalian cells include mink 
lung epithelial cells, HeLa cells, Chinese hamster ovary cells, 
Hep3B cells, GM7373 cells, and NIH 3T3 cells. 

A preferred indicator molecule also' described for use with 
the methods of this invention is a chemi luminescent molecule, 
35 preferably lucif erase. 
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The invention describes a composition of a plasmid vector 
in capable of causing expression of an indicator molecule in a 
eucaryotic cell, where the plasmid contains nucleotide 
sequences comprising a regulatory region that includes at least 
one TGF-S inducible response element operatively linked to a 
promoter, a structural region downstream of said promoter and 
coding for said indicator molecule, and a gene encoding a 
selectable marker for the selection of a stably transformed 
cell, where the response element is capable of inducing dose- 
dependent lucif erase activity. 

In preferred embodiments, plasmids with selectable marker 
genes have the nucleotide sequences corresponding to SEQ ID NOs 
1-6. Preferred TGF-fi inducible response elements for use in 
the expression vectors of this invention have the nucleotide 
sequences corresponding to SEQ ID NOs 11-17. 

A further preferred embodiment of the expression vectors 
of this invention is the use of the neomycin gene for selecting 
stable transformants, the nucleotide sequence of which is 
listed in SEQ ID NO 20. 

The invention further describes plasmids lacking a 
selectable marker gene having the identifying characteristics 
of plasmid ATCC Accession Numbers 75627, 75628, 75629, 
corresponding to SEQ ID NOs 8-10, respectively. 

The invention describes a eucaryotic cell containing a 
plasmid having a nucleotide sequence listed in SEQ id NOs 1-10. 

Kits useful in assaying the amount of TGF-E in a liquid 
sample comprising (a) packaging material; (b) eucaryotic cells 
capable of expressing an indicator molecule and containing a 
plasmid of this invention and an aliquot of TGF-fi, where the 
latter is used for generating a reference curve. 

Other embodiments will be apparent to one skilled in the 

art . 

Brief Description of thp nr^nn. 

Figure 1 shows the structure and construction of the 
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pBOOneoLuc expression vector. pSOOLuc was digested with AccI 
and blunt-ended. pMAMneo was then digested with Sal I and Eco 
RI blunt-ended, and the fragment containing the neomycin- 
resistance gene (nee*) was ligated to the linearized pSOOLuc to 
5 form pSOOneoLuc. Clones were analyzed via restriction enzyme 
mapping and one clone with the proper insert was selected. 
(MCS, multiple cloning site; PA1, 2, 3, polyadenylation regions 
1, 2, and 3). The details of the construction are described in 
Example 1A. 

10 Figure 2A, having an inset (Figure 2B) , shows the dose- 

dependent induction of the plasminogen activator inhibitor- 
1/lucif erase (PAI/L) construct in pSOOneoLuc expression vector 
in stably transformed MLE cells by TGF-fil. TGF-S2 , and TGF-&3 . 
The TGF-fc assay was performed as described in Example 3 with 
15 DMEM-BSA containing the indicated concentrations in picomoles 

(p M ) of recombinant (r) TGF-fcl (closed squares), TGF-E2 (closed 
circles), or TGF-E3 (closed triangles) on the X-axis. The 
amount of expressed lucif erase detected by a luminometer is 
plotted on the Y-axis and is expressed in relative light units 
20 (RLU) . The results shown in Figures 2A, 2B and 2C are 

described in Example 3B. Figure 2B shows the treatment of, 
pSOOneoLuc-transformed MLE cells with all three TGF-B isoforms 
in a TGF-£ assay that resulted in a linear dose-response over 
the range of 0 to 4 pM of TGF-S. In Figure 2C. the TGF-S assay 
25 was performed with 8 pM rTGF-Sl, TGF-S2 or TGF-S3 in DMEM-BSA 
in the presence (cross-hatched bars) or absence (open bars) of 
100 ug/ml of anti-TGF-S, TGF-£2 and TGF-E3 monoclonal antibody. 
Baseline induction is indicated by medium alone (filled bars) . 

Figures 3A, 3B, 3C and 3D show the effects of medium, cell 
30 density and incubation time on sensitivity of the TGF-S assay 

as described in Example 3B with the amount of TGF-fil plotted on 
the X-axis in pM against the measured RLU on the Y-axis. In 
Figure 3A, the assay was performed with increasing rTGF-Sl 
concentrations in DMEM (closed squares) , alpha-MEM (closed 
35 circles), CMEM (closed triangles: Eagles MEM supplemented with 
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non-essential amino acids) or RPMI-1640 (closed diamonds: Bio- 
Whittaker) . All media contained 0.1% BSA. In Figure 3B, 
increasing concentrations of rTGF-fil in DMEM, 0.1% BSA were 
measured using 3.2 x 10 4 (closed squares), 1.6 x 10 4 (closed 
circles), or 0 . 8 x 10 4 (closed triangles) clone 32 (C32) of 
mink lung epithelial cells/well (MLE cells) after a three hour 
attachment period. Samples were incubated with the cells for 
14 hours prior to assaying for lucif erase activity. In Figures 
3C and 3D (an inset in Figure 3C) , 1.6 x 10 4 C32 cells were 
allowed to attach for 3 hours prior to addition of the 
indicated concentrations of rTGF-fil. The samples were 
incubated for 6 (closed squares), 14 (closed circles), or 22 
(closed triangles) hours prior to assaying for lucif erase 
activity. The results are described in Example 3B. 

Figures 4A and 4B show the effects of growth factors on 
the TGF-fi assay and MLEC assay while Figure 4C shows the 
effects caused by serum. For all figures, either the growth 
factors or TGF-S are plotted on the X-axis against the RLU on 
the Y-axis. In Figure 4A, the TGF-fi assays were performed with 
DMEM- BSA containing the indicated concentrations of rTGF-fil 
(closed squares), recombinant human bFGF (closed circles), 
recombinant IL-lalpha (closed triangles) , recombinant PDGF-BB 
(closed diamonds), or EGF (open squares). In Figure 4B, TGF-S 
assays were performed with DMEM- BSA containing 1 pM rTGF-fil 
(closed squares) and the indicated concentrations of 
recombinant human bFGF (closed circles), recombinant IL-lalpha 
(closed triangles), recombinant PDGF (closed triangles), or EGF 
(open squares) . The assays and results are described in 
Example 3C. In Figure 4C, TGF-fi assays were performed with 
DMEM-BSA containing the indicated concentrations of rTGF-fil 
alone (closed squares) or with 0.5% (closed circles), 1% 
(closed triangles), or 2% (closed diamonds) calf serum. The 
assays and results are described in Example 3D. 

Figure 5 shows the comparison of CMs assayed by the TGF-fi 
(shown as the PAI/L assay) and MLEC assays. DMEM BSA (closed 



WO 95/19987 



PCT/US95/01153 



-13- 



squares), COS (X-marked lines), BSM (closed triangles) or BAE 
(closed circles) cell conditioned medium (CM) with the 
indicated concentrations of rTGF-£l were assayed by PAI/L (TGF- 
S) assay (broken line) as measured by RLU on the right-hand Y- 
5 axis and MLEC (unbroken line) assay as measured by tritiated 
thymidine ( 3 H- thymidine) incorporation percent of controls 
described in Example 3E. The data points were normalized to 
DMEM-BSA. 

Figure 6 shows the effects of growth factors on DNA 
10 synthesis as measured by 3 H-thymidine incorporation percent of 
control. In the graph, DMEM-BSA containing rTGF-£l (closed 
squares), TGF-S2 (closed circles) , TGF-S3 (closed triangles) , 
recombinant human bFGF (closed diamonds), recombinant IL-lalpha 
(open squares), EGF (open circles), or recombinant PDGF-BB 
15 (open triangles) were separately assayed using the MLEC assay 
as described Example 3C. 



Detailed Descript ion of the Invention 
A. Definitions 
20 TRPPomhinant DNA (rDNA) Molecule : A DNA molecule 

produced by operatively linking two DNA segments. Thus, a 
recombinant DNA molecule is a hybrid DNA molecule comprising at 
least two nucleotide sequences not normally found together in 
nature. rDNA 1 s not having a common biological origin, i.e., 
25 evolutionarily different, are said to be "heterologous". 

Vector : A rDNA molecule capable of autonomous replication 
in a cell and to which a DNA segment, e.g., gene or 
polynucleotide, can be operatively linked so as to bring about 
replication of the attached segment. Vectors capable of 
30 directing the expression of genes encoding for one or more 
* polypeptides are referred to herein as "expression vectors". 

Tins t ream : In the direction opposite to the direction of 
DNA transcription, and therefore going from 5' to 3' on the 
non-coding strand, or 3 ' to 5' on the mRNA. 
35 Downstream : Further along a DNA sequence in the direction 
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of sequence transcription or read out, that is traveling in a 
3 ' - to 5' -direction along the non-coding strand of the DNA or 
5'- to 3' -direction along the RNA transcript. 

Reading Frame : Particular sequence of contiguous 
5 nucleotide triplets (codons) employed in translation that 

define the structural protein encoding-portion of a gene, or 
structural gene. The reading frame depends on the location of 
the translation initiation codon. 

Response Element : Also referred to as an enhancer 

10 element, is a short DNA sequence that occurs further upstream 
than the upstream promoter element. Response elements contain 
specific nucleotide sequences recognized by transcription 
factors that are DNA-binding proteins. 

Promoter : A region on a DNA molecule, generally from 10 0 

15 to 200 base pairs longs, upstream from the coding sequence; an 
area to which the RNA polymerase initially binds prior to the 
initiation of trancription . The nucleotide sequence of the 
promoter, or at least part of it, determines the nature of the 
polymerase that associates with it. Certain consensus 

20 sequences, CAT and TATA boxes, with the promoter region are 
important for binding of RNA polymerase. 

Regulatory Region : A DNA control module upstream from the 
coding sequence containing an upstream promoter element and 
response elements, the latter of which is also referred to as 

25 enhancer elements. 

Growth Factor : A small protein that binds to a receptor 
for controlling cell proliferation. 

Receptor : A molecule, such as a protein, glycoprotein and 
the like, that can specifically (non -randomly) bind to another 

30 molecule. Receptors of one type are plasma membrane proteins 
that bind specific molecules including growth factors, 
hormones, or neurotransmitters, resulting in the transmission 
of a signal to the cell's interior causing the cell to respond 
in a specific manner. 

35 Sense Strand : A nucleotide sequence referred to as a 
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sense strand of a double- stranded deoxyribonucleic acid 
sequence is the nucleotide sequence that when read in the 5' to 
3' direction by the genetic code defines an amino acid sequence 
of interest. Alternatively, sense strand is referred to as a 
5 coding strand. 

B. Transforming Grow th Factor-fi (TGF-S) 

Transforming growth factor-S, hereinafter referred to 
as TGF-S, is a growth inhibitor that exhibits a diversity of 
10 biological activities in addition to its effects on cellular 
proliferation. TGF-S belongs to a large family of related 
molecules with a wide range of regulatory activities as 
described in the Background. For review, see Barnard et al . , 
ft-jnrhim. Binnhv.g. Acta . . 1032:79-87 (1990), the disclosure of 
15 • which is hereby incorporated by reference. 

As previously discussed, TGF-S is produced and secreted 
from cells in three distinct molecular isoforms of TGF-S, the 
genes of which are located on different chromosomes, have been 
identified in mammals and are designated TGF-S1 , TGF-S2 and 
20 TGF-S3 . Derynck et al., MsJtliEfi, 316:701-705 (1985); Hanks et 
al., Prnr. Nat. 3. Anad . Sc.i . . USA . 85:71-72 (1988); and Madisen 
et al., 7:1-8 (1988). Each of the isoforms are 

synthesized as high molecular weight latent or inactive 
precursor polypeptides that are then processed to 12.5 kD 
25 monomers that then dimerize to form biologically active, also 
referred to as mature, TGF-S. 

The activation process must occur to allow binding of the 
dimerized TGF-S to the high affinity TGF-S receptors expressed 
on the surfaces of all normal cells and most all neoplastic 
30 cells. Tucker et al . , Prnr. Natl. Acad. Sci , ■ USA , 81:6757- 
6761 (1984); Frolik et al . , J BiaJ - Chem. . 259:10995-11000 
(1984); Pircher et al . , pi orhpm. gj nnhvs Res . Commun . , 136:30- 
37 (1986) . 

TGF-S has been shown to induce the increase secretion of 
35 the inhibitor, plasminogen activator inhibitor-1 (PAI-1) (Laiho 
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et al., J, Biol , Chftm, , 262 : 17467-17474 (1987 )) . PAI-1 is the 
primary inhibitor of both tissue-type plasminogen activator (t- 
PA) and urokinase-type plasminogen activator (u-PA) , and as 
such is a potent anti-f ibrinolytic molecule. As a consequence 
of PAI-1 induction by TGF-S, the activity of plasminogen 
activator (PA) is decreased. The resulting cascade of 
activation of plasminogen to plasmin is thereby inhibited 
resulting in the subsequent degradation of fibrin. 

While PAI-1 synthesis by TGF-E has been shown to occur 
primarily at the level of transcription following the TGF-E 
receptor-ligand interaction, the mechanism of activation of the 
PAI-1 promoter resulting in the transcription of the PAI-1 gene 
is less well understood. Studies of PAI-1 gene transcription 
have shown that the signal transduction mechanisms are 
independent of <3e dpyq protein synthesis as determined by the 
lack of inhibition by cycloheximide and rapid onset of 
induction as described by Sawdey et al . , J. Biol. Chem. , 
264:10396-10401 (1989), the disclosure of which is hereby 
incorporated by reference. The TGF-E-induced enhancement of 
promoter activity for the alpha2 collagen gene has been shown 
to be mediated by a binding site for nuclear factor I as 
described by Sporn et al., J. Cgll Biol , . 105:1039-1045 (1987). 

As shown in Example 4, the PAI-1 promoter contains AP-1- 
like nucleotide sequences which is bound by the AP-1 
heterodimeric transcription factor complex of Fos and Jun 
protein subunits. Although AP-l-like DNA enhancer sites are 
present in PAI-1, as shown in Example 4, activation of these 
sites by the AP-1 heterodimeric complex was independent of the 
TGF-£-mediated induction of PAI-1 synthesis. 

Although the exact transcriptional mechanism of PAI-1 
promoter activation following TGF-S receptor-ligand interaction 
is not known as well as the identification of the responsible 
TGF-S-related transcription factor, the activation of a TGF-S 
response element of this invention following TGF-S occupancy of 
the TGF-fi receptor will be referred to as TGF-fi- induced 
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activation. Since the TGF-fi response element is activated by 
TGF-S resulting in the induction of indicator protein 
expression, the TGF-S response element is also referred to as a 
TGF-S inducible response element 

5 

C. T^F-ft Re sponse Elements. 

The present invention is based on the discovery that 
when eucaryotic cells, transformed with a TGF-£-responsive 
expression vector of this invention, were exposed to liquid 
10 samples of TGF-S, the resulting expression of an indicator 

molecule was dose-dependent in relationship to the amount of 
TCF-fc present in the sample. Thus, the present invention 
provides for a method to quantify the amount of TGF-fi in an 
liquid sample by measuring the amount of indicator molecules 

15 • expressed. 

The induced expression of the indicator molecules was the 
result of activation of TGF-fi response elements present in the 
regulatory region of the TGF-fc responsive expression vectors, 
the latter of which are described in Section D. 
20 m practicing this invention, the regulation of 

transcription in the TGF-S responsive expression vector- 
transformed eucaryotic cells is dependent TGF-S. As described 
above, the TGF-S occupation of the TGF-fc receptor expressed on 
the surface of cells results in the activation of a TGF-B- 
25 related transcription factor. In general, transcription 

factors are site-specific DNA-binding proteins. Typically, 
usually positioned 5' to a structural gene is a region of 
nucleotide sequences that are responsible for controlling 
transcription. This region has been coined the "control 
30 module" . 

The control module comprises two categories of regulatory 
sequences, the promoter element and the enhancer elements. The 
promoter is referred to as an upstream promoter as it lies 
upstream of the structural genes. Promoter elements are 
35 usually 100 to 200 base pairs long and the segment of DNA is 
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relatively close to the site of initiation of transcription. A 
particular sequence recognized by one of several transcription 
factors that are known to bind to the promoter region is the 
TATA box, a region that is rich in A-T base pairs. 
5 The enhancer regions are also referred to as response 

regions or response elements. Thus the term "TGF-S response 
element" can also be designated "TGF-S enhancer" , "TGF-£ 
enhancer region", or H TGF-£ response region", and the like. 
The enhancer region is hereinafter referred to as a response 

10 element. They are short DNA segments that occur further 

upstream from the initiator site than the upstream promoter 
element. Response elements contain specific sequences that are 
recognized by transcription factors. The response elements are 
often a few 1000 base pairs 5* to the promoter but may even be 

15 20,000 base pairs or more distant. 

The binding of a transcription factor to either a 
nucleotide sequence comprising a response element or promoter 
resembles an M on switch" . In the context of the present 
invention, the binding of the TGF-S-related transcription 

20 factor results in the dose-dependent activation of the promoter 
resulting in the transcription of a structural region gene from 
DNA into RNA. In most cases, the resulting RNA molecule serves 
as a template for synthesis of a specific molecule, such as the 
indicator molecule of this invention. 

25 Thus, "activation" of a TGF-E response element refers to a 

process whereby the functional state of the TGF-E response 
element is altered. The result of the TGF-S activation of the 
TGF-S response element is an increase in the transcriptional 
efficiency of the structural gene driven from the promoter. 

30 A further embodiment of a TGF-S response element is that 

it is inducible. The term "inducible" refers to a an 
enhancement of a particular function. In this invention, the 
functional activity of a TGF-J5 response element is increased or 
induced following activation by the TGF-£-related transcription 

35 factor. Thus, the TGF-S response element is also referred to 
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as a TGF-S inducible response element. 

The result of TGF-E response element activation is the 
coordinate transcription and translation of the structural 
region containing a gene encoding an indicator protein of this 
5 invention as described in Section D. The resulting expression 
of an indicator molecule is dose -dependent in relationship to 
the amount of TGF-S present in the sample. 

The term "dose-dependent" refers to the functional 
relationship between the amount of TGF-S activating the TGF-S 
10 response element and the resulting expression of the indicator 
molecule. Thus, the functional relationship between TGF-S 
activation and expression of an indicator molecule can be 
referred to as a linear relationship. Because of the dose- 
dependent expression of an indicator molecule, such as 
15 luciferase, in response to TGF-6 exposure, the amount of TGF-B 
responsible for the activation of the expression can be readily 
determined using the methods of this invention. 

Thus, based on the teachings herein, a TGF-S response 
element nucleotide sequence is characterized by its ability to 
20 be responsive to TGF-B-induced activation. Such a TGF-fi 
response element is useful herein as a component in the 
expression vectors of this invention to provide for the ability 
to quantify the amount of TGF-E responsible for the 
transcriptional activation. Thus, a TGF-fi response element of 
25 this invention comprises any nucleotide sequence that is 

activated by TGF-fi, the process of which is as described in 
Section B. 

In the context of this invention, the term nucleotide 
sequence refers to a plurality of joined nucleotide units 
30 formed from naturally- or non-naturally occurring bases and 
cyclofuranosyl groups joined by phosphodiester bonds. Thus, 
the nucleotide sequence includes the use of nucleotide analogs. 

One embodiment of a TGF-S response element of this 
invention is an isolated double-stranded deoxyribonucleic acid 
35 molecule comprising a sequence of nucleotide bases that defines 
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a TGF-S response element. However, neither is it necessary 
that the obtained TGF-E be a naturally occurring sequence 
present in the other genes nor that the TGF-S response element 
be limited to deoxyribonucleotides . The TGF-S response element 
5 may be found in DNA or RNA, in regulatory sequences, exons, or 
introns . 

Preferred TGF-£ response elements are derived from 
selected regions of the promoter regions of the plasminogen 
activator inhibitor type 1 gene, hereinafter referred to as 

10 PAI-1, as described by Loskutoff et al . , Biochem. , 26:3763-3768 
(1987), the disclosure of which is hereby incorporated by 
reference. Loskutoff et al . describes a cosmid containing the 
entire PAI-1 gene. In a related study, the glucocorticoid 
regulation of the PAI-1 promoter was described by van Zonneveld 

15 et al., Proc. Nat l , Acad. Sci . , 85:5525-5529 (1988), the 

disclosure of which is hereby incorporated by reference. The 
sequence of the PAI-1 promoter corresponding to nucleotide 
positions -800 and extending through the TATA box and 
initiation site and ending at nucleotide position +200, the 

20 latter of which corresponds to the PAI-1 encoded protein at the 
ninth amino acid residue, in available in the GenBank TO /EMBL 
Data Bank with Accession Number J03836. 

Moreover, Bosma et al . , J. Biol . Chem. , 263:9129-9141 
(1986), have described the entire 15,867 bp PAI-1 gene sequence 

25 including significant stretches of DNA that extend into its 5*- 
and 3' -flanking DNA regions, the nucleotide sequence of which 
is available in the GenBank TO /EMBL Data Bank with Accession 
Number J03764. 

The PAI-1 promoter- derived TGF-fi response elements for use 
30 in this invention are identified by the nucleotide positions 

corresponding to the region in the PAI-1 promoter as listed in 
the GenBank™/ EM3L Data Bank Accession Number J03836. 

Exemplary TGF-£ response elements derived from the PAI-1 
promoter have the nucleotide sequences listed in the Sequence 
35 Listing in SEQ ID NOs 11-17. The nucleotide sequences are 
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listed showing only the sense strand in the 5' to 3' direction 
of a double-stranded isolated TGF-S response element nucleotide 
sequence. The PAI-l-derived TGF-S response elements 
corresponding to SEQ ID NOs 11-17 have the respective 
designations with the nucleotide regions corresponding to the 
PAI-1 promoter indicated in parentheses: 1) SEQ ID NO 11 = 
1500 (-1481 to -40); 2) SEQ ID NO 12 = 800 (-800 up to -40); 3) 
SEQ ID NO 13 = 800/636 (-800 up to -636); 4) SEQ ID NO 14 = 56 
(-56 to -41); 5) SEQ ID NO 15 = 674 (-674 to -650); 6) SEQ ID 
NO 16 = 743 (-743 to -708); and 7) SEQ ID NO 17 = 732 (-732 to 
-708). 

In one embodiment, a TGF-S response element useful for 
practicing the present invention may be derived from any 
promoter nucleotide sequence. In a further embodiment, a TGF-S 
15 response element may be designed to contain preselected 

nucleotide bases. In other words, a subject TGF-S response 
element need not be identical to the nucleotide sequence of the 
PAI-1 -derived TGF-S response elements described herein, so long 
as the nucleotide sequence is activatable by TGF-S. 
20 A TGF-S response element of this invention thus may 

contain a variety of nucleotide units of any length, typically 
from about 5 to about 2000 nucleotides in length. More 
preferably, a TGF-S response element comprises nucleotide units 
from about 15 to about 1500 nucleotides in length. 
25 A preferred embodiment is a TGF-S response element having 

nucleotide sequences that is greater than 50 base pairs in 
length. Exemplary long TGF-S response elements derived from 
PAI-1 are listing in the Sequence Listing in SEQ ID -NOs 11-13. 
A preferred embodiment is a TGF-S response element having 
30 nucleotide sequences that is less than 50 base pairs in length. 
Exemplary short TGF-S response elements derived from PAI-1 are 
listing in the Sequence Listing in SEQ ID NOs 14-17 . 

In one embodiment, the invention contemplates the presence 
of at least one TGF-S response element present in the 
35 regulatory region of the expression vectors as described in 
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Section D. Thus, one or more stretches of a nucleotide 
sequence comprising a TGF-E response element may be present 
within a regulatory region. If more than one TGF-S response 
element is present, they are not required to be identical. In 
other words, TGF-B response elements having different 
nucleotide sequences as well as different lengths can be 
combined in a regulatory region of an expression vector of this 
invention . 

TGF-S response elements can be derived or produced from 
the PAI-1 promoter by truncation or expansion of the native or 
wild- type PAI-1 promoter nucleotide sequence or as a variant of 
the native PAI-1 promoter by site-directed substitution of a 
preselected nucleotide base or bases. 

Also contemplated in this context are regulatory regions 
containing multiple TGF-fi response elements that can be either 
longer, shorter, tandemly arranged, reversed in orientation, 
and permutations thereof. The design and construction of such 
arrangements are well known to one of ordinary skill in the art 
of oligonucleotide design and synthesis and are described by 
Sambrook et al . , Molecular Cloning: A Laboratory Manual, Cold 
Spring Laboratory, pp 390-401 (1982). 

It is also contemplated that nucleotide base modifications 
can be made resulting in nucleotide analogs to provide certain 
advantages to the TGF-E response elements of this invention. 

A nucleotide analog refers to moieties that function 
similarly to nucleotide sequences in a TGF-S response element 
of this invention but which have non-naturally occurring 
portions. Thus, nucleotide analogs can have altered sugar 
moieties or inter-sugar linkages. Exemplary are the 
phosphorothioate and other sulfur-containing species, analogs 
having altered base units, or other modif icacions consistent 
with the spirit of this invention. 

Preferred modifications include, but are not limited to, 
the ethyl or methyl phosphonate modifications disclosed in the 
U.S. Patent No., 4,469,863 and the phosphorothioate modified 
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deoxyribonucleotides described by LaPlaiiche et al . , Hucl , Acids 
pes . , 14:9081 (1986) and Stec et al . , J, Am, Chffrn, SPC , . 
106:6077 (1984), the disclosures of which are hereby 
incorporated by reference. These modifications provide 
5 resistance to nucleolytic degradation. Preferred modifications 
are the modifications of the 3 ' -terminus using phosphothionate 
(PS) sulfurization modification described by Stein et al . , 
Nhirl Arids Res., 16:3209 (1988). 

TGF-fi response elements comprising nucleotide sequences 

10 can be obtained by a variety of procedures well known in the 
art, including de novo chemical synthesis of complementary 
oligonucleotides and derivation of nucleic acid fragments from 
native nucleic acid sequences existing as genes, or parts of 
genes, in a genome, plasmid, or other vector, such as by 

15 restriction endonuclease digestion of larger nucleic acid 

fragments and strand separation or by enzymatic synthesis using 
a nucleic acid template. 

De novo chemical synthesis of oligonucleotides can be 
carried out, for example, by the phosphotri ester method 

20 described by Matteucci et al., J, Am. Chem. SPC»„i 103:3185 
(1981), or as described in U.S. Patent No. 4,356,27 0, the 
disclosures of which are hereby incorporated by reference. A 
particularly preferred method is the phosphoramide method using 
commercial automated synthesizers, such as the ABI automated 

25 synthesizer by Applied Biosystems. Inc., (Foster City, CA) . 
Oligonucleotides can be purified after synthesis using 
published procedures as described by Miller et al . , J , Sial « 
Chem. . 255:9659 (1980). Thereafter, complementary 
oligonucleotides are hybridized to form double- stranded DNA 

30 segments that are TGF-fi response elements. Particularly 

preferred chemically-synthesized oligonucleotides are described 
in Example 1C and the sense strands of which are listed in SEQ 
ID NOs 14-17, as described above. 

Derivation of a TGF-S response element from nucleic acids 

35 involves the cloning of a nucleic acid into an appropriate host 
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by means of a cloning vector, replication of the vector and 
therefore ntultiplication of the amount of the cloned nucleic 
acid followed by isolation of subfragments of the cloned 
nucleic acids. For a description of subcloning nucleic acid 
fragments, see Sambrook et al . , Molecular Cloning: A 
Laboratory Manual, Cold Spring Laboratory, pp 390-401 (1982); 
and see U.S. Patent Nos 4,416,988 and 4,403,036. 

In one embodiment, TGF-E response elements are obtained by 
restriction digestion of cloned vectors containing the PAI-1 
promoter as described in Example 1A and 1C. Particularly 
preferred nucleotide sequences containing TGF-S response 
elements as well as the minimal promoter sequence obtained in 
this manner include nucleotide sequences corresponding to the 
nucleotide positions in the PAI-1 promoter sequence from -1481 
to +76, specifically a Kpn I/Eco RI digest and -800 to +76, 
specifically a Hind III/Eco RI digest. 

In an additional embodiment, in the practice of this 
invention, it is not necessary that the TGF-B response element 
nucleotide sequence be known in order to obtain a TGF-S 
response element capable of being activated by TGF-£. To that 
end, contemplated for use in this invention are TGF-£ response 
elements obtained from promoter regions of other genes that can 
be determined to contain TGF-fi response elements using the 
methods of this invention. 

D - TGF-fi Responsive Plasmid Expression Vprfnrc; 

The present invention contemplates TGF-S responsive 
plasmid expression vectors in substantially pure form capable 
of causing expression of an indicator molecule in a eucaryotic 
cell. The term "TGF-fi responsive" identifies an expression 
vector of this invention that by its composition contains TGF-fi 
response elements that are activated by TGF-6 mediated through 
a TGF-fi response element specific transcription factor as 
described in Section C. Vectors capable of directing the 
expression of genes to which they are operatively linked are 
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ref erred to herein as "expression vectors". 

As used herein, the term "vector" refers to a nucleic acid 
molecule capable of transporting between different genetic 
environments another nucleic acid to which it has been 
5 operatively linked. One type of preferred vector is an 

episome, i.e., a nucleic acid capable of extra -chromosomal 
replication. Preferred vectors are those capable of autonomous 
replication and/or expression of nucleic acids to which they 
are linked. 

10 A TGF-S expression vector of this invention is a circular 

double- stranded plasmid that contains at least the following 
elements: 1) a regulatory region having at least one TGF-E 
response element as defined in Section C, where the regulatory 
region is operatively linked to a promoter; and 2) a structural 
15 . region downstream of the promoter that contains a gene coding 
for an indicator molecule of this invention. 

in a separate embodiment, a TGF-S expression vector also 
contains a gene, the expression of which confers a selective 
advantage, such as a drug resistance, to the eucaryotic host 
20 cell when introduced or transformed into those cells. A 

typical eucaryotic drug resistance genes confers resistance to 
neomycin, also referred to as G418 or Geneticin. 

The choice of vector to which the regulatory region, 
promoter, and structural region of the present invention is 
25 operatively linked depends directly, as is well known in the 

art, on the functional properties desired, e.g.. replication or 
protein expression, and the host cell to be transformed, these 
being limitations inherit in the art of constructing 
recombinant DNA molecules. 
30 in preferred embodiments, the vector utilized includes 

procaryotic sequences that facilitate the propagation of the 
vector in bacteria, i.e., a DNA sequence having the ability to 
direct autonomous replication and maintenance of the 
recombinant DNA molecule extra- chromosomal ly when introduced 
35 into a bacterial host cell. Such replicons are well known in 
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the art. In addition, the TGF-£ expression vector of this 
invention includes one or more transcription units that are 
expressed only in eucaryotic cells . 

The eucaryotic transcription unit consists of noncoding 
5 sequences and sequences encoding selectable markers . The 
expression vectors of this invention also contain distinct 
sequence elements that are required for accurate and efficient 
polyadenylation, referred to as PA1, 2 and 3 and as shown in 
Figure 1. In addition, splicing signals for generating mature 

10 mRNA are included in the vector. The eucaryotic TGF-S 

responsive expression vectors contain viral replicons, the 
presence of which provides for the increase in the level of 
expression of cloned genes. A preferred replication sequence 
is provided by the simian virus 40 or SV40 papovavirus. 

15 Operatively linking refers to the covalent joining of 

nucleotide sequences, preferably by conventional phosphodiester 
bonds, into one strand of DNA, whether in single- or double- 
stranded form. Moreover, the joining of nucleotide sequences 
results in the joining of functional elements such as response 

20 elements in regulatory regions with promoters and downstream 
structural regions as described herein. 

A preferred eucaryotic expression vector of this invention 
as prepared in Example 1 contains a regulatory region having 
TGF-S response elements derived from the 5' promoter end of the 

25 human plasminogen activator inhibitor type 1 (PAI-1) gene 

operatively linked to PAI-1 minimal promoter and a downstream 
structural region containing a gene coding for an indicator 
polypeptide, preferably lucif erase. 

Exemplary TGF-S responsive expression vectors include the 

30 following expression vectors, the designations of which are 

indicated along with the corresponding SEQ ID NO in which the 
sense strand of the expression vector is listed where the first 
nucleotide of the double-stranded circular vector is the middle 
"T M nucleotide present in the Eco RI restriction site as 

35 described in Example 1: 1) p800neoLuc (SEQ ID NO 1); 2) 
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P 800/636neoLuc (SEQ ID NO 2); 3} P 56neoLuc (SEQ ID NO 3); 4) 
p674neoLuc ( SEQ ID NO 4); 5) P 743neoLuc (SEQ ID NO 5); 6) 
P 732neoLuc { SEQ ID NO 6); 7) P 56Luc (SEQ ID NO 7); 8) p674Luc 
(SEQ ID NO 8); 9) p743Luc (SEQ ID NO 9); and 10) p732Luc (SEQ 
ID NO 10) . 

The exemplary TGF-£ expression vectors of this invention 
are derived from the starting cloning expression vector, 
designated pl9Luc, as described in Example 1. The nucleotide 
sequence of the sense strand of an Eco Rl-linearized pl9LUC 
vector is listed in the Sequence Listing as SEQ ID NO 21. 

A further embodiment of this invention is the preparation 
of TGF-E responsive expression vectors having altered 
arrangements of and selected types of TGF-S response elements 
in the regulatory region. To that end, pl9Luc and the pl9Luc- 
15 derived p39Luc expression cloning vectors, both of which is 
described in Example 1, are vectors that allow for the 
construction of TGF-S responsive vectors having any selected 
regulatory region operatively ligated to a selected promoter. 
Therefore, any regulatory region of any length containing one 
20 or more TGF-B response elements can be paired with any 

promoter, a non-TGF-S responsive PAI-1 or heterologous HBV 
promoter as used herein but not limited to that, to prepare 
TGF-S responsive expression vectors that provide for the 
quantitation of inducing TGF-fc. 
25 m a related embodiment, in addition to the construction 

methods detailed herein, other methods of preparing pl9Luc- 
derived expression vectors having TGF-fi response elements and 
promoters are familiar to one of ordinary skill in the art of 
vector construction and are described by Ausebel, et al., In 
30 Current Protocols in Molecular Biology. Wiley and Sons, New 
York (1993) and by Sambrook et al., Molecular Cloning: A 
Laboratory Manual, Cold Spring Harbor Laboratory, 1989. 
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1. pia^ri yprmrs f o ^- qrahlp Transformations 

In practicing one aspect of this invention, a 
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preferred embodiment is a TGF-fi responsive expression vector 
having a gene for encoding a selectable marker providing for 
stably transformed cells. Stably transformed cells confer the 
ability to utilize a reproducible source for practicing the 
methods of this invention over a course of time. A preferred 
selectable marker gene is the gene conferring neomycin- 
resistance. Such a gene for encoding the selectable marker was 
derived from an expression vector, designated pMAMneo, as 
described in Example 1. The nucleotide sequence of the 
neomycin-resistance conferring gene is listed in SEQ ID NO 20. 

In one embodiment, a TGF-fi responsive expression vector 
contains a first nucleotide sequence comprising a regulatory 
region that includes at least one TGF-E inducible response 
element operatively linked to a promoter/ a second nucleotide 
sequence comprising a structural region downstream of the 
promoter and coding for an indicator molecule, and a third 
nucleotide sequence comprising a gene encoding a selectable 
marker for the selection of a stably transformed cell, where 
the response element is capable of inducing dose-dependent 
luciferase activity and the structural region codes for 
lucif erase . 

Preferred expression vectors containing the neomycin- 
resistance conferring gene include the following designations 
followed in parenthesis by the corresponding SEQ ID NO in which 
the sense strand of each Eco Rl-linearized vector is listed 
according to the convention adopted in this invention for 
listing vector sequences: 1) p800neoLuc (SEQ ID NO 1); 2) 
p800/63 6neoLuc (SEQ ID NO 2); 3) p56neoLuc (SEQ ID NO 3); 4) 
p674neoLuc (SEQ ID NO 4); 5) p743neoLuc (SEQ ID NO 5); 6) 
p732neoLuc (SEQ ID NO 6) . 

In a further embodiment, the plasmid expression vectors of 
this invention contain TGF-S inducible response elements that 
correspond to a nucleotide sequence listed in SEQ ID NOs 11-17 
as described in Section C. 

Preferred promoters for use in the TGF-S expression 
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vectors of this invention for stably transforming cells as well 
as for transient transformation are the PAI-1 minimal promoter 
sequence and the hepatitis B virus minimal promoter sequence, 
the sense sequences of which are respectively listed in SEQ ID 
NOs 18 and 19. Contemplated for use in this invention are 
promoters that are not responsive to TGF-S. The selection of 
alternative promoters is within the scope of one having 
ordinary skill in the art. 

This invention contemplates additional TGF-S expression 
vectors for stably transforming cells that can be designed to 
have regulatory regions that contain alternative TGF-S response 
elements and promoters. 

a. ppmiiatiorv Region 

The regulatory region of a TGF-S expression 
vector of this invention contains at least one TGF-S response 
element as described herein and in Section C of this invention. 
As contemplated for use in this invention, the regulatory 
region of a TGF-S expression vector can range in length from 5 
to 2000 base pairs, preferably 15 to 1500 base pairs, and can 
contain more than one TGF-S response element in any orientation 
and arrangement. Thus, if two or more TGF-S response elements 
are present in a regulatory region, they may be contiguous with 
one another or separated by an intervening nucleotide sequence. 
The desion and construction of such arrangements are well known 
to one of ordinary skill in the art of oligonucleotide design 
and synthesis and are described by Sambrook et al . , Molecular 
Cloning: A Laboratory Manual, Cold Spring Laboratory, pp 390- 
401 (1982). 

Preferred TGF-S response elements present in the 
regulatory region of a TGF-S expression vector are derived from 
the PAI-1 promoter and have the nucleotide sequences listed m 
the Sequence Listing in SEQ ID NOs 11-17. The PAI-l-derived 
TGF-S response elements corresponding to SEQ ID NOs 11-17 have 
the respective designations with the nucleotide regions 
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corresponding to the PAI-1 promoter indicated in parentheses: 
1) SEQ ID NO 11 = 1500 (-1481 to -40); 2) SEQ ID NO 12 = 800 (- 
800 up to -40); 3) SEQ ID NO 13 = 800/636 (-800 up to -636); 4) 
SEQ ID NO 14 = 56 (-56 to -41); 5) SEQ ID NO 15 = 674 (-674 to 
5 -650); 6) SEQ ID NO 16 = 743 (-743 to -708); and 7) SEQ ID NO 
17 = 732 (-732 to -708) . 



b. Structural Region 

A plasmid vector of the present invention 

10 contain a structural region having a nucleotide sequence that 
encodes an indicator molecule. The structural region is 
operatively linked to the regulatory region such that the 
inducible promoter of the regulatory region, under the 
inducible control of the TGF-S response element, controls 

15 transcription and expression of the indicator molecule. Thus, 
upon induction of the TGF-S response element, the regulatory 
region transcribes and thereby expresses the indicator molecule 
resulting in a detectable event in the cell, which event can be 
measured by detection of the amount of the expressed indicator 

20 molecule. In other words, the response element is capable of 
inducing the expression of the indicator molecule by virtue of 
it's controlling expression of the indicator through the 
promoter to which the response element is operatively linked. 
Typically, the structural region is "downstream" of the 

25 regulatory region in the plasmid, and positioned to be under 
the direct control of the regulatory region. Other 
configurations can be utilized so long as the induction of the 
TGF-S response element results in the expression of the 
indicator polypeptide. Exemplary and preferred configurations 

30 are described in Examples. 

The term "indicator molecule" ■ as used in this invention 
refers to a molecule encoded by a reporter gene, the expression 
of which in the expression vectors of this invention, results 
in a detectable measurable protein, polypeptide, enzyme and the 

35 like. Alternative expressions for indicator molecule are 
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reporter molecule, reporter polypeptide, indicator protein, 
indicator polypeptide and the like. In preferred embodiments, 
the indicator molecule is a protein. 

There are any of a variety of indicator polypeptides 
suitable for use in the present invention, and the invention 
need not be so limited to any particular indicator. A 
preferred indicator polypeptide is luciferase encoded by the 
firefly luciferase gene. Use of the luciferase gene for 
expression of luciferase has been described by Gould et al . , 

Biochem .. 7:5-13 (1988) and Brasier et al . , Bi£^ 
^rhmmies . 7:1116-1122 (1989). A preferred structural region 
includes a nucleotide sequence having the sequence 
characteristics of the luciferase gene shown in SEQ ID NO 21. 

Alternative embodiments include indicator proteins such a 
15 - S-galactosidase and chloramphenicol acetyltransf erase (CAT) . 
Use of a S-galactosidase and CAT as reporter molecules have 
been respectively by Luskin et al . , Hevrpp , 1:635-647 (1988) 
and Gorman et al . , MoJ Csl2 Biol., 2:1044-1051 (1982). 

Associated with the use of an indicator molecule in the 
quantifying TGF-S are means for measuring the indicator 
molecule. A preferred method for detecting the luciferase 
indicator molecule is the use of a luminometer commercially 
available from Dynatech Laboratories Inc., Chant illy. VA as 
described in Example 3A and analyzed according to 
manufacturer's instructions. For detecting CAT activity, a 
simple-phase extraction assay has been developed and described 
by Seed et al . , Ssnz, 67:271-277 (1988), the disclosure of 
which is hereby incorporated by reference. Alternative 
preferred methods for detecting CAT activity are described in 
Current Protocols in Molecular Biology, Eds, Ausebel et al., 
Unit 9.0, John Wiley & Sons (1993). Expression of &- 
galactosidase activity is performed in activity assays 
performed essentially as described by Miller, Experiments m 
Molecular Genetics, Cold Spring Harbor Laboratory, New York, 
(1972), the disclosure of which is hereby incorporated by 
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reference. With fi-galactosidase additional reagents are 
required to visualize its presence following induced 
expression. Such additional reagents for S-galactosidase 
include o-nitrophenyl-S-D-galactopyransoside and the like for 
5 the development of a color reaction by absorbance at 
wavelengths of 500 and 420. 

c. Selectable Mark er Gene 

In preferred embodiments, the plasmid 

10 vector of the present invention includes a gene that encodes a 
selectable marker that is effective in a eucaryotic cell/ 
preferably a drug resistance selection marker. A preferred 
drug resistance selection marker is a gene whose expression 
results in neomycin resistance, i.e., the neomycin 

15 phosphotransferase (neo) gene [Southern et al . , J . Mol . Appl . 
Genet , , 1:327-341 (1982)] or a gene whose expression results 
kanamycin resistance, i.e., the chimeric gene containing 
nopaline synthetase promoter, Tn5 neomycin phosphotransferase 
II and nopaline synthetase 3 ' non-translated region described 

20 by Rogers et al . , Methods for Plant Molecular Biology . A. 

Weissbach and H. Weissbach, eds . , Academic Press, Inc., San 
Diego, CA (1988). Other selectable markers which are 
utilizable in eucaryotic cells can be utilized in the present 
vectors and methods and therefore the invention need not be 

25 limited to any particular selectable marker. Thus, the 

invention contemplates the use of a nucleotide sequence which 
confers a eucaryotic selection means, including but not limited 
to genes for resistance to neomycin and kanamycin. 

A preferred nucleotide sequence defining a selectable 

3 0 marker gene is a nucleotide sequence having the sequence 

characteristics of the neomycin resistance gene shown in SEQ ID 
NO 20. 

The use of a selectable marker for eucaryotic cells 
provides the advantage of producing stably transformed cells, 
35 as discussed herein. Thus, one can produce a eucaryotic cell 
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line containing a plasmid vector of this invention for use in 
the present methods wherein all the cells of the culture are 
selected to be uniform and each contain intact plasmid vector, 
thereby assuring that all of the eucaryotic cell in the culture 
5 are substantially similar in responsiveness to TGF-fc, thereby 
increasing the reliability and sensitivity of the assay. 

In addition, preferred embodiments that include a 
procaryotic replicon also include a gene whose expression 
confers a selective advantage, such as a drug resistance, to 
10 the bacterial host cell when introduced into those transformed 
cells. Typical bacterial drug resistance genes are those that 
confer resistance to ampicillin or tetracycline. 

Those vectors that include a procaryotic replicon also 
typically include convenient restriction sites for insertion or 
15 a recombinant DNA molecule of the present invention. Typical 
of such vector plasmids are pUC8, pUC9, pBR322, and pBR329 
available from BioRad Laboratories, (Richmond, CA) and pPL, pK 
and K223 available from Pharmacia, (Piscataway, NJ) , and 
pBLUESCRIPT and pBS available from Stratagene, (La Jolla, CA) . 
20 A vector of the present invention may also be a Lambda phage 
vector including those Lambda vectors described in Molecular 
pinning; A T.^horator v Manual. Second Edition, Maniatis et al . , 
eds.. Cold Spring Harbor, NY (1989). 

Plasmid vectors for use in the present invention are also 
25 compatible with eukaryotic cells. Eucaryotic cell expression 
vectors are well known in the art and are available from 
several commercial sources. Typically, such vectors provide 
convenient restriction sites for insertion of the desired 
recombinant DNA molecule, and further contain promoters for 
30 expression of the encoded genes which are capable of expression 
in the eucaryotic cell, as discussed earlier. Typical of such 
vectors are pSVO and pKSV-10 (Pharmacia), and pPW-l/PML2d 
(International Biotechnology, Inc.), and pTDTl (ATCC, No. 
31255) . 

35 
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2 . Plasmid Vectors for Co-transf ormation and 

Transient Transformation 

This invention contemplates the use of TGF-E 
responsive expression vectors having regulatory, promoter and 
5 structural regions but lacking a gene for encoding a selectable 
marker. In other words, in practicing this invention, TGF-& 
expression vectors for transient transformation of eucaryotic 
cells are contemplated. This embodiment allows for an 
alternative to stable transformation of cells for use 

10 practicing the methods of this invention. Transiently 

transformed cells produced as described in Example 2D. ar'e 
useful for performing TGF-S assays when having stably 
transformed cells is not required or necessitated. As 
described in Example 4, transiently transformed cells are 

15 useful for determining the nucleotide sequence of TGF-S 

response elements as well as quantifying the amount of TGF-fi 
present in a heterogeneous or homogeneous liquid sample. 

Preferred TGF-S expression vectors used for transiently 
transforming eucaryotic cells include the following vectors 

20 shown with their designations and SEQ ID NOs in which the sense 
strand of the double- stranded Eco Rl-linearized vectors is 
listed: 1) p56Luc (SEQ ID NO 7); 2) p674Luc (SEQ ID NO 8); 3) 
p743Luc (SEQ ID NO 9); and 4) p732Luc (SEQ ID NO 10). 

The invention further describes TGF-S responsive plasmids 

25 lacking a selectable marker gene having the identifying 

characteristics of plasmids that have been deposited with the 
American Type Culture Collection, Rockville, MD having the 
assigned ATCC Accession Numbers 75627, 75628, 75629., the 
plasmids of which respectively correspond to the Eco RI- 

30 linearized sense strand nucleotide sequences listed SEQ ID NOs 
8-10. 

In an additional embodiment, this invention describes the 
co-transformation of TGF-S expression vectors for transient 
transformation in conjunction with a second expression vector 
35 from which a selectable marker is expressed. A preferred 
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selectable marker expressing plasmid is RSVneo as described in 
Example 2C. The ability to prepare stably transformed cells 
through the use of a vector that only confers transient 
transformation is accomplished with this approach. The 
advantage this approach provides is that further vector 
constructions for inserting selectable marker genes can be 
avoided, thereby providing stably transformed cells for use in 
practicing this invention when necessitated. Thus, eucaryotic 
cells that have been co- trans formed with a transient TGF-fc 
expression vector and a second plasmid such as RSVneo provide 
for an alternative approach to create stably transformed 
eucaryotic cells. 

Any transient TGF-S expression vector of this invention 
can be used in this context. A preferred co- transformed 
15 eucaryotic cell is the cell line Hep3B that has been co- 
transformed with RSVneo and the pl500Luc expression vector 
having the TGF-6 response element in SEQ ID NO 11 . This stably 
transformed cell line has been deposited with the American Type 
Culture Collection, Rockville, MD and has been assigned ATCC 
20 having ATCC Accession Number CRL 11508. 

With the teachings of this invention, additional TGF-E 
expression vectors for transiently transforming cells can be 
designed to have regulatory regions that contain alternative 
TGF-E response elements and promoters. In a further 
25 embodiment, these additional vectors can be used to prepare 
stably transformed cells through the use of the co- 
transformation approach. 



30 



3. Rfcinient r^lTs for Transformations 

Insofar as the invention describes plasmid 
vectors for use in the present invention, the invention also 
contemplates a eucaryotic cell containing a plasmid vector of 
the present invention. 

A eucaryotic cell suitable for use can be any eucaryotic 
35 cell which expresses a TGF-E receptor on its cell surface and 
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is capable of induction of a TGF-& response element. There are 
a variety of means to identify a suitable eucaryotic cell, 
including, but not limited to transformation by a plasmid 
vector of this invention, followed by assay for expression of 
5 the indicator polypeptide upon challenge by TGF-S. 

In a preferred embodiment, this invention contemplates the 
use of mammalian cells. Preferred mammalian cells include mink 
lung epithelial cells, HeLa cells, Chinese hamster ovary cells, 
Hep3B cells, GM7373 cells, NIH 3T3 cells, and the like cells. 

10 These and other suitable mammalian cells are widely available. 
Suitable mammalian cells for use in the invention can also be 
obtained from the American Type Culture Collection (ATCC; 
Rockville, MD) . 

Introduction of a plasmid vector of the present invention 

15 into a eucaryotic cell can be accomplished by a variety of 

methods well known in the art, including, but not limited to 
transf ection, transformation, electroporation, microinjection, 
liposome fusion, and the like introduction methods. Such 
methods are well known and are not to be considered essential 

20 to the invention. Furthermore, the introduction of the plasmid 
vector can be transient or stable. 

A transient introduction is one where there is no 
selection to maintain the plasmid vector within the host 
eucaryotic cell through multiple rounds of cell division. 

25 Therefore, the assay is to be conducted in a short time period 
after introduction, and before several rounds of cell division. 
Stable introduction of plasmid involves the culturing of the 
cell under conditions that select for the maintenance of the 
plasmid vector, typically by the use of a gene on the plasmid 

30 that encodes a selectable marker, as described further herein. 

Following the introduction of the plasmid vector, the 
resulting eucaryotic cell containing a plasmid vector is used 
in the assay methods described herein. A preferred eucaryotic 
cell contains a plasmid vector of this invention, which plasmid 

35 vector comprises a nucleotide sequence having a TGF-£ response 
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element and a gene encoding an indicator polypeptide, wherein 
the plasmid is capable of expression of the indicator 
polypeptide in response to TGF-E induction. Particularly 
preferred are eucaryotic cells that contain a plasmid vector 
5 having a nucleotide sequence with the nucleotide sequence 

characteristics of the TGF-fc response element selected from the 
group consisting of the sequences shown in SEQ ID NOs 11-17. A 
particularly preferred eucaryotic cell contains a plasmid 
vector having a nucleotide sequence with the nucleotide 

10 sequence characteristics of the plasmid vector selected from 

the group consisting of the sequences shown in SEQ ID NOs 1-10. 

A preferred eucaryotic cell described further herein is 
Hep3B stably transformed with the plasmid vector pl500Luc, 
referred to as LUCI, and having the ATCC accession No. CRL 

15 11508. 

E. Methods f or Quanti fvino TGF-S 

The present invention describes methods for detecting 
the presence, and preferably quantifying the amount , of TGF-B 
20 in a liquid sample, either containing purified TGF-S or TGF-S 

in a heterogeneous admixture, and is also referred to herein as 
a TGF-S assay. The assay system provides for the 
quantification of TGF-E through the expression of an indicator 
polypeptide which is expressed in levels proportional to the 
25 amount of TGF-S being detected. 

The assay is a highly sensitive and specific, non- 
radioactive assay, for detecting mature (active) TGF-B. When 
compared to the sensitive and widely used proliferation-based 
mink lung epithelial cell (MLE cells) method for measuring TGF- 
30 & concentration, the TGF-S assay method of this invention is 
more rapid, has comparable sensitivity, and has a greater 
detection range. Specificity of this novel assay was also 
higher as evidenced by its relative insensitivity to factors 
such as epidermal growth factor (EGF) and basic fibroblast 
35 growth factor (bFGF) which can greatly affect other assays. 
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The use of a TGF-& response element, such as the truncated PAI- 
1 promoter, that does not respond to other growth modulators 
such as platelet-derived growth factor (PDGF) found in 
biological samples provides an added advantage that the method 
of this invention can be used in conditions where other 
bioassays are difficult to interpret. Because of its large 
range and specificity, the rapid, sensitive, non-radioactive, 
easily performed assay method of this invention is useful in 
determining active TGF-£ concentrations in complex solutions. 

Thus, the present invention overcomes the limitations of 
existing methods used to quantify the amount of TGF-S in a 
liquid sample. This invention contemplates a method for 
quantifying the amount of TGF-E in a sample using a system 
comprising a TGF-S responsive cell containing an expression 
vector having a TGF-S response element and an indicator 
molecule. Following TGF-fi induction, transcription results in 
the expression of an indicator molecule, the amount of which 
allows for the measurement of the amount of TGF-S responsible 
for the induction. 

TGF-S receptor-bearing cells are transfected with a TGF-S 
responsive expression vector of this invention, and are 
subsequently exposed to TGF-& whereupon the TGF-S receptor- 
bearing cells activate the TGF-£ response element in the vector 
which results in the concomitant expression of the indicator 
polypeptide. The resulting expressed indicator polypeptide is 
then measured in a manner depending upon the indicator 
polypeptide employed. 

The measured indicator polypeptide resulting from 
activation by TGF-S in the test liquid sample is then compared 
to a standardized reference curve produced using known amounts 
of TGF-fi.. 

In particular, one embodiment of the invention 
contemplates a method for quantifying the amount of TGF-S in a 
liquid sample, which method comprises: 

(a) incubating the liquid sample together with eucaryotic 
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cells that contain a TGF-E responsive expression vector having 
a gene encoding an indicator polypeptide for a predetermined 
time period sufficient for the eucaryotic cells to express a 
detectable amount of the indicator polypeptide; 
5 (b) measuring the amount of the indicator polypeptide 

expressed during the time period; and 

(c) determining the amount of TGF-S present in the sample 
by comparing the measured amount of the indicator polypeptide 
against a reference curve. 

10 Preferably, the reference curve represents a quantitative 

relationship derived from a series of measured amounts of 
indicator polypeptide produced from a series of known 
concentrations of TGF-S. 

The standardized reference curve is obtained from parallel 

15 assays performed by exposing similarly transfected cells to a 
range, usually in serial dilution, of known (measured) amounts 
of one or more of the known TGF-E isoforms. The resulting 
expressed indicator polypeptide is then determined by direct 
detection of the indicator polypeptide. A reference curve is 

'20 then generated by plotting the measured amount of expressed 
indicator polypeptide against the known range of inducing 
amounts of TGF-S. The amount of unknown TGF-S in the test 
liquid sample is then determined by extrapolating the measured 
amount of test indicator polypeptide to the reference curve. 

25 The use of standard curves in quantifying the amount of 

protein in a liquid sample in general has been described by 
Lowry et al . , *T- Biol , Chem. , 193:265-275 (1951), the 
disclosure of which is hereby incorporated by reference. As 
shown in the Examples herein, the TGF-£ assay of this invention 
30 allows for the measurement of TGF-fi from the expression and 
subsequent detection of an indicator polypeptide from a 
concentration range from less than 5 picograms/ml (pg/ml) 
equivalent to 0.2 pM up to 10 ng/ml equivalent to 40 pM (or 
0.4 nM) . The dose- dependent response to TGF-fi is linear 
35 between 0.2 pM up to 100 pM depending on the assay conditions. 
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As described further herein, any of a variety of indicator 
polypeptides can be utilized in the present methods, and the 
invention is not to be construed as limited to any particular 
indicator polypeptide. However, a preferred embodiment 
utilizes a chemiluminescent molecule, more preferably 
luciferase, as the indicator polypeptide, and therefore the 
examples herein using luciferase are to be considered exemplary 
of all indicator polypeptides and of preferred embodiments. 
The level of expressed luciferase is easily and conveniently 
measured using a luminometer as described herein. 

In another embodiment of the present invention, the assay 
method for quantifying TGF-E in complex solutions is practiced 
generally as described above, but with the additional use of a 
neutralizing anti-TGF-£ monoclonal antibody admixed with the 
test liquid sample in assays run in parallel to untreated test 
liquid samples as described in Example 3B. These control 
assays are used to determine if other molecules are present in 
the test sample that can affect the assay through either 
inhibition or activation of other regions of the TGF-fi response 
element. For example, conditioned medium obtained from cell 
cultures and body fluids contain growth factors and DNA binding 
proteins that function as transcriptional activators or 
inhibitors. If a corresponding response element for an 
additional non-TGF-S activator is present in the expression 
vector, the binding of the activator to the response element 
may cause enhanced or diminished expression of the indicator 
polypeptide. By antibody neutralization of the TGF-E in the 
test sample, any residual measured indicator polypeptide can 
then be ascribed to non-TGF-S activation. 

The shorter TGF-S response elements used in the expression 
vector systems of this invention are less likely to have non- 
TGF-S response elements as shown in Examples 3E and 3F. Thus, 
the use of parallel antibody control assays to allow for a 
determination of the amount of luciferase produced from only 
TGF-S activation is preferred when using expression vectors 
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responsiveness to transcription factors other that those 
induced by TGF-S. Moreover, while the TGF-S assay is not 
generally isoform specific, The assay can be TGF-S isoform- 
5 specific by the use of the appropriate standard reference 
curves and parallel assays with neutralizing antibodies 
immunospecific to a particular TGF-S isoform species, thereby 
allowing for quantification of unique TGF-S isoforms. 

Thus, in another embodiment of the invention, a method 
10 for quantifying the amount of transforming growth factor-S 
{TGF-S) in a liquid sample is contemplated, the method 
comprising: 

(a) providing, in eucaryotic cells capable of expressing 
an indicator molecule, a plasmid comprising, in the direction 

15 of transcription, a regulatory region that includes at least 

one TGF-S inducible response element that is operably linked to 
a promoter, and a structural region downstream of the promoter, 
where the response element is capable of inducing dose- 
dependent indicator molecule activity and where the structural 

20 region codes for the indicator molecule; 

(b) incubating the liquid sample with the eucaryotic^ 
cells for a predetermined time period sufficient for the 
eucaryotic cells to express a detectable amount of the 
indicator molecule; 

25 (c) measuring the amount of the indicator molecule 

expressed during the time period; and 

(d) comparing the measured amount of the indicator 

molecule produced in step (c) with the amount of indicator 

molecule produced in a control assay performed according to 
30 steps (a) through (c) by treating the liquid sample with an 

anti-TGF-S antibody to obtain a net measured amount of the 

indicator molecule induced by TGF-S. 

The use of a monoclonal antibody specific for TGF-S 

provides particular advantages in practicing the invention. 
35 First, one can use a variety of TGF-S response elements, 
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including those which exhibit responsiveness to factors in 
addition to TGF-E, which activity is subtracted out by the use 
of the control data obtained using the antibody treatment . 
Second, one can correct for spurious induction or inhibition of 
a TGF-S response element by factors other than TGF-S. The 
analysis of comparative data (comparing) produced by conducting 
the present method both with and without anti-TGF-£ antibody 
for the purpose of determining the level of TGF-S in a liquid 
sample, can be conducted by a variety of statistical methods 
that are not to be construed as limiting to the invention. 
Exemplary comparative analyses are described in the Examples. 

Contemplated for use with any of the above TGF-fi assay 
methods of this invention are plasmids having identifying 
characteristics of plasmids on deposit with ATCC having the 
ATCC Accession Numbers 75627, 75628 and 75629. Also 
contemplated are eucaryotic cells that contain the TGF-E 
response element having the nucleotide sequence in SEQ ID NO 11 
where the cells correspond to cells on deposit with ATCC -having 
the ATCC Accession Number CRL 11508. In one embodiment, the 
use of stably transformed eucaryotic cells are contenplated. 

The invention describes plasmids for use in the methods 
that comprise a nucleotide sequence corresponding to nucleotide 
sequences listed in SEQ ID NOs 1-10. TGF-S inducible response 
elements that comprise a nucleotide sequence corresponding to 
nucleotide sequences listed in SEQ ID NOs 11-17 are also 
described. Contemplated promoter nucleotide sequences are 
listed in SEQ ID NOs 18 and 19. 

A further embodiment of the methods of the invention are 
eucaryotic cells that are stably transformed cells containing a 
plasmid having a gene encoding a selectable marker for the 
selection of said stably transformed cells. The invention 
describes such plasmids having nucleotide sequences listed in 
SEQ ID NOs 1-6. The invention further describes a stably 
transformed eucaryotic cell on deposit with ATCC having ATCC 
Accession Number CRL 11508 containing the TGF-E response 
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element having the nucleotide sequence in SEQ ID NO 11. 

An additional embodiment are eucaryotic cells that are 
transiently transformed cells with plasmids corresponding to 
the nucleotide sequences listed in SEQ ID NOs 7-10. 
5 The use of stably transformed cells is particularly 

preferred because it provides uniformity and reproducibility to 
the cell based assay without the need for additional controls 
for the efficiency of transformation typically associated with 
methods using transient transformation. Stably transformed 

10 cells do not require the use of an internal standard for 

transformation efficiency, and all of the cells utilized are 
typically uniformly transformed. Furthermore, the methods do 
not require the additional step of transforming the cells 
transiently because the stably transformed cell line is already 

15 . available. 

The invention describes quantifying the amount of TGF-& in 
a body fluid, in culture medium, in a tissue extract, and in 
the like liquid samples, A further preferred embodiment is the 
determination of the amount of a specific isoform of TGF-fi, 

20 specifically TGF-E1, TGF-S2 or TGF-S3 , in a liquid sample. 

In a preferred embodiment, this invention describes the 
use of any eucaryotic host cell that contains a TGF-S receptor 
and is capable of inducing a TGF-S response element upon 
activation by TGF-S. Exemplary assays for measuring activation 

25 by TGF-fi and induction of a TGF-S response element are 

described herein and can be used to identify candidate host 
cells suitable for use in the present diagnostic methods. A 
preferred host cell is a mammalian cell. Preferred mammalian 
cells include mink lung epithelial (MLE) cells, particularly 

30 clone C32 from MLE cells, HeLa cells, Chinese hamster ovary 

(CHO) cells, Hep3B cells, GM7373 cells, NIH 3T3 cells, and the 
like cells. 

Conditions for incubating a eucaryotic cell in the present 
methods are the same as general cell culture methods. Typical 
35 cell culture media for culturing and incubating eucaryotic 
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cells include alpha -MEM, Eagle's MEM (having non-essential 
amino acids), RPMI 1640 and Dulbecco 1 s modified MEM (DMEM) , all 
which are well known in the art. The culture medium preferably 
contains 0.5 to 2 % (v/v) serum, preferably a fetal calf or 
fetal bovine serum (FCS or FBS) . Cell culture conditions 
include the use of cells plated at a density of about 0.8 to 
about 3.2 x 10 4 cells per well of a 96-well tissue culture 
plate, preferably about 1.6 x 10 4 cells per well. Cells are 
typically plated at the indicated density, and allowed to grow 
until they reach a confluence density of from about 7 0% 
confluent to about 1 day post-confluent, but should preferably 
be allowed to grow after plating for a time period sufficient 
for the cells to express detectable levels of TGF-E receptor, 
which time period is typically about 0.5-24 hours, preferably 
about 1-5 hours, and preferably is about 3 hours. 

After plating and culturing, the eucaryotic cells are 
incubated under culturing conditions with culture medium that 
includes a predetermined volume of a liquid sample believed to 
contain TGF-S. The incubation time period is a time sufficient 
for any TGF-E present in the liquid sample to interact with the 
eucaryotic cell TGF-& receptor and thereby induce the TGF-S 
response element and express the indicator polypeptide. The 
time required for the expressed indicator polypeptide to 
accumulate to detectable levels will vary with the choice of 
indicator and method of detection, and can be predetermined. 
However, typical incubation times for contacting the cell with 
the liquid sample can range from 2 to 24 hours, preferably 
about 6 to 22 hours, more preferably 10 to 20 hours, and 
particularly about 14 hours. Particularly preferred culturing 
and incubation conditions for use in the present methods are 
described in the Examples. 

The detection of TGF-S in liquid samples such as body 
fluid or tissue extract samples is useful in following the 
levels of TGF-S in patients experiencing a variety of 
conditions where the TGF-JS level is important to the clinician. 
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For exairple, TGF-S levels are significant in diseases 
characterized by excessive fibrosis such as hepatic fibrosis 
and the like, in proliferative and in conditions where there is 
an increase in collagen expression, and the like conditions 
where TGF-B is believed to participate. In addition, there are 
many therapeutic uses of TGF-S, and therefore, the present 
assay methods are useful for measuring the therapeutic fate of 
administered TGF-S in patients being treated therapeutically 
with TGF-S. 



F. Diaanost jr Mef.hods and Kits 

The present invention also contemplates a diagnostic 
system in kit form for assaying the amount of TGF-fi in a liquid 
sample according to the present methods. The diagnostic kit 
15 . contains, in an amount sufficient for at least one assay, a 
eucaryotic cell of this invention useful for practicing the 
diagnostic methods for detection of TGF-E. 

The kit can further contain a packaging material . 
Packaging material can include container* s) for storage of the 
20 materials of the kit, and can include a label or instructions 
for use. 

The kit can additionally contain an aliquot of reference 
TGF-E for use in generating a standard reference curve using 
the methods of the invention. 

25 Thus in preferred embodiments, a diagnostic kit includes, 

in an amount sufficient for at least one assay, the following: 
(a) packaging material; (b) eucaryotic cells contained within 
the packaging material, where the cells are capable of 
expressing an indicator molecule and containing a plasmid 

30 comprising, in the direction of transcription, a regulatory 
region that includes at least one TGF-S inducible response 
element that is operatively linked to a promoter, and a 
structural region downstream of said promoter, where the TGF-S 
response element is capable of inducing dose- dependent 

35 indicator molecule activity and the structural region coding 
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for said indicator molecule; and (c) an aliquot of TGF-fi 
contained within said packaging material, where the TGF-S is 
used for generating a reference curve as described herein 
representing a measured amount of the indicator molecule 
produced from a known concentration of TGF-E. 

As used herein, the term "packaging material" refers to a 
solid matrix or material such as glass, plastic, paper, foil 
and the like capable of holding within fixed limits eucaryotic 
cells and an aliquot of TGF-E. Thus, for example, packaging^ 
material can be a plastic vial used to contain eucaryotic cells 
in growth medium to which liquid samples can be added for 
activating the TGF-fi responsive plasmid within the cells. 
Packaging material can also be a glass vial in which an aliquot 
of TGF-& is contained for use in generating a reference curve, 
the latter of which is described in Section E. 

As used herein, an "aliquot" of TGF-E refers to an amount 
of TGF-S sufficient to generate a reference curve of this 
invention. In preferred embodiments, the aliquot of TGF-fi is 
provided in the form of a substantially dry powder, i.e., in 
lyophilized form, for subsequent reconstitution or in the form 
of a solution, i.e., a liquid dispersion. Preferably the 
amount of powdered TGF-S is in the range of 25 nanograms (ng) , 
more preferably 125 ng to 625 ng, and most preferably 250 ng. 
Preferably the amount of TGF-& in liquid solution is in the 
range of 1 to 50 nanomolar (nM) , more preferably 5 to 25 nM and 
most preferably 10 nM. Preferred serial dilutions of TGF-S used 
in generating the reference curve are described in Section E. 
The TGF-S provided in the kit preferably includes each of the 
three TGF-g isoforms as described in Section B. 

The term "indicator molecule or indicator polypeptide" as 
used in this invention and described in Section Dl refers to a 
molecule encoded by a reporter gene, the expression of which in 
the expression vectors of this invention, results in a 
detectable measurable protein, polypeptide, enzyme and the 
like . 
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In preferred embodiments, the packaging material includes 
a label indicating that eucaryotic cells containing TGF-S 
responsive expression vectors can be used for determining the 
amount of TGF-S in a liquid sample that includes the steps of 
(a) incubating the cells with the selected liquid sample; (b) 
measuring the amount of the induced indicator molecule; and (c) 
comparing the amount of measured indicator molecule with a 
reference curve. Thus, the packaging material contains a label 
that is a tangible expression describing the methods of this 
invention as described in Section E. of using plasmid- 
transformed eucaryotic cells for quantifying the amount of TGF- 
S in a test liquid sample. 

The packaging materials discussed herein in relation to 
the kit of this invention are those customarily utilized in 
15 kits or diagnostic systems. Such materials include glass and 
plastic, the latter of which include polyethylene, 
polypropylene and polycarbonate, bottles, vials, plastic and 
plastic-foil laminated envelopes and the like. 

The eucaryotic cells transformed with the TGF-E responsive 
expression vectors of this invention are cells that express 
TGF-S receptor on their cell surface as described in Section E. 
All normal cells and most all neoplastic cells have cell 
surface membrane receptors also referred to a binding proteins 
for TGF-S. For review, see Tucker et al . , Proc , N9t1 , Acad, 
Ssj . , USA . 81:6757-6761 (1984) and Frolik et al . , ,T. Bial ■ 
Chem. . 259:10995-11000 (1984). The receptors have previously 
been described in' Section E. Preferred cells for use with the 
TGF-S assay kit include mink lung epithelial cells (MLE cells) , 
HeLa cells, Chinese Hamster Ovary cells, Hep3B cells, GM7373 
cells and NIH 3T3 cells, with the C32 clone from the mink lung 
epithelial cells being the most preferred cell line. 

in preferred embodiments, the eucaryotic cells are 
transformed with the expression vector plasmids described in 
Section D have a nucleotide sequence that corresponds to a 
35 sequence in SEQ ID NOs 1-10. Contemplated for use in the kit 
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are stably and transiently transformed eucaryotic cells. As 
described in Section Dl, for preparing stably transformed 
eucaryotic cells, the plasmids corresponding to SEQ ID NOs 1-6 
are preferred for use. A further preferred eucaryotic cell for 
use in the kit is the Hep3B cell line co-transf ected with 
pl500Luc and RSVneo for preparing stably transformed cells that 
have been deposited with ATCC having the ATCC Accession Number 
CRL 11508 and identified by the designation "LUCI". For 
preparing transiently transformed eucaryotic cells, the 
plasmids corresponding to SEQ ID NOs 7-10 are preferred for 
use . 

In preferred embodiments, eucaryotic cells for use with 
the kit contain a plasmid having the identifying 
characteristics of a plasmid on deposit with ATCC having the 
Accession Numbers 75627, 74628 and 75629 as described in 
Section C. 

The kit of this invention further includes an anti-TGF-E 
antibody for use in a parallel control assay for determining 
the amount of indicator molecule produced other than by TGF-S 
induction. Preferred anti-TGF-fi antibodies are anti-TGF-El, 
anti-TGF-£2 or anti-TGF-£3 monoclonal antibodies commercially 
available from Genzyme Corp., Cambridge, MA. 

Preferred diagnostic assays accomplished with the kit 
performed as described herein are for the quantitation of the 
amount of TGF-S in a liquid sample. A liquid sarrple can 
include an isoform of TGF-E, specifically TGF-fil, TGF-S2 or 
TGF-S3 . A liquid sample further includes any body fluid, 
culture medium and a tissue extract that may contain unknown 
quantities of TGF-&. Thus, the liquid sample includes the body 
fluids, serum, plasma, whole blood, lymph fluid, synovial 
fluid, follicular fluid, seminal fluid, amniotic fluid, urine, 
spinal fluid, saliva, sputum, tears, perspiration, mucus and 
the like. Culture medium includes culture supernatant, also 
referred to as conditioned medium, collected from cells 
maintained in tissue culture as described in Example 3B. 
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Tissue extracts also encompass extracts of cells, referred to 
as cellular extracts. In addition, organs such as placentas 
can be obtained and extracted with well known procedures to 
prepare placental extracts. Extracts can also be obtained of 
5 any body organ or portion thereof, tissue or cells, including 
normal, tumorigenic, and malignant cells. This is generally 
accomplished by surgical means, i.e., by biopsy samples 
including needle aspirates, tissue scrapings, or freshly 
dissected tissues and the like. Extracts are the collected 
10 samples are then prepared by means including homogenizat ion in 
lysis buffers, including detergents such as NP-40, Triton X- 
100, and the like. Common methods include using potters, 
blenders, ultrasound generators, and dounce homogenizers . 

15 EXAMPLES 

The following examples relating to this invention are 
illustrative and should not, of course, be construed as 
specifically limiting the invention. Moreover, such variations 
of the invention, now known or later developed, which would be 

20 within the purview of one skiiled in the art are to be 

considered to fall within the scope of the present invention 
hereinafter claimed. 

1. Prpnaratin^ of Exp r ^^ on Vectors Containing TSF-£ 
25 Rpsnnnse Elements 

A. So urce, c.lny^n vprtnr Constructs and 

Prpnarati on of Expression Vectors for Stable 
Trari^ format ion 

Eucaryotic expression vectors having a regulatory 
30 region having at least one TGF-E response element derived from 
the 5' promoter end of the human plasminogen activator 
inhibitor type 1 (PAI-1) gene operatively linked to a PAI-1 
minimal promoter and a downstream structural region containing 
a gene coding for an indicator polypeptide, preferably 
35 luciferase, were prepared and designated generally as PAI/L 
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eukaryotic expression constructs. Operatively linking refers 
to the covalent joining of nucleotide sequences, preferably by 
conventional phosphodi ester bonds, into one strand of DNA, 
whether in single- or double- stranded form. Moreover, the 
joining of nucleotide sequences results in the joining of 
functional elements such as response elements in regulatory 
regions with promoters and downstream structural regions as 
described herein. 

The expression vector constructs of this invention were 
then used for preparing stably transformed cells for use in the 
quantitative TGF-fi assays of this invention. The expression 
vectors were designed to contain varying lengths and 
arrangements of the TGF-S response elements from the PAI-1 
promoter, a neomycin-resistance conferring gene for selection 
and a gene encoding an indicator polypeptide, preferably 
lucif erase. Two starting vectors were required to prepare the 
expression vectors having a neomycin-resistance conferring 
gene. One of these starting cloning plasmid vectors, 
designated pl9Luc, was previously described by van Zonneveld et 
al " p rPC Natl, ftcpd f?ci., USA , 85:5525-5529 (1988), the 
disclosure of which is hereby incorporated by reference. 

1} Preparation of Honing Ve ctor pi QT fV n 

The promoter-less reporter gene pl9Luc plasmid 
was originally designed by van Zonneveld et al . , Proc. w«M 
Acad, Sci , ■ TT . qzn , 85:5525-5529 (1988) to monitor promoter 
activity with a structural region, having the firefly 
lucif erase gene to function as a reporter gene, fused to a SV4 0 
splice and polyadenylation site. The pl9Luc plasmid also 
contained a multiple cloning site preceded by two SV-40-derived 
polyadenylation sites. The pl9Luc plasmid was constructed from 
PSVOAL-AA5 ' , a vector described by De Wet et al . , Mol . ££13 
ZiQl^. 7:725-737 (1987). The P SVOAL-AA5 ' was first linearized 
with Hind III and one portion of the plasmid was blunt-ended by 
filling in the Hind III sites with E. coli DNA polymerase I 
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large fragment (Klenow) , ligated to phosphorylated Eco RI 
linkers (New England Biolabs, Beverly, MA) . Two of the 
resulting fragments, the 621 bp fragment originally containing 
the 5' end of the lucif erase gene and the 2718 bp fragment 
5 originally located on the 5' end of this fragment, were 

isolated. A second portion of the Hind Ill-cleaved pSV0AL-AA5 ' 
was ligated to a 55 bp polylinker and cleaved with Eco RI . The 
resulting 2831 bp fragment containing the multiple cloning site 
and the pBR322-derived ampicillin resistance-conferring gene 

10 was isolated. These fragments were ligated to create the 
circular double-stranded pl9Luc plasmid that contained the 
three fragments in their original orientation but with the 
multiple cloning site in the original Hind III site. 

The continuous 617 0 bp sense strand, also referred to as 

15 the coding strand, nucleotide sequence of an Eco Rl-linearized 
pl9LUC vector is listed in the Sequence Listing as SEQ ID NO 
21. The convention adopted for listing the nucleotide 
sequences 0 f the pl9Luc vector as well as all the expression 
vectors of this invention derived from pl9Luc is to list only 

20 the sense strand of each vector with the nucleotide position 1 
always beginning with the middle of the Eco RI site, 
specifically the first T nucleotide. 

The Eco Rl-linearized pl9Luc vector contained the 
following list of elements and restriction sites beginning with 

25 the 5' middle Eco RI "T" nucleotide position 1 and extending to 
the 3' end of the vector ending with the middle Eco RI h A 11 
nucleotide position 6170 (nucleotide positions as listed in SEQ 
ID NO 21 are indicated in parentheses) : a Pst I restriction 
site (750-755) within the pBR322-derived ampicillin resistance- 

30 conferring gene (amp) ; an Acc I restriction site downstream of 
the amp gene (2113-2118) ; two tandem polyadenylation sites 
immediately upstream of the multiple cloning site beginning 
with Bam HI (2771-2776) and Hind III (2778-2783), continuing 
with adjacent Sph I, Pst I, Hinc II /Acc 37 Sal I, Xba I, Bam HI, 

35 Xma I/Sma I, Kpn I, Sst I, and ending with Eco RI (2829-2834); 
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the luciferase gene adjacent to the Eco RI site in which are 
four restriction sites, Xba I (2910-2915), Eco RI (3450-3455); 
Sph I (3522-3527), and Xba I (4564-4569); an SV40 splice site 
adjacent to the 3' end of the luciferase gene followed by a 
third polyadenylation site; a Bam HI restriction site (5417- 
5422); and lastly a Pst I restriction site (5962-5967). 

For use in preparing the expression vectors of this 
invention, the multiple cloning site in the promoterless pl9Luc 
plasmid described above allowed for the directional ligation of 
both non-TGF-fi responsive promoters and TGF-£ responsive 
regulator regions containing TGF-fi response elements, the 
latter of which comprised the regulatory region of the 
resultant vectors. The promoters and TGF-fi response elements 
and the ligation thereof to form TGF-fi expression vectors are 
described herein and below. 

Thus, the pl9Luc plasmid was used as a cloning vector for 
construction of all the expression vectors of this invention. 
The advantage of using the pl9Luc and the pl9Luc-derived p3 9Luc 
expression cloning vectors, the latter of which is described 
below, is that the vectors allow for the construction of TGF-fi 
responsive vectors having a selected regulatory region 
operatively ligated to a selected promoter. Therefore, any 
regulatory region of any length containing one or more TGF-fi 
response elements can be paired with any promoter, a non-TGF-S 
responsive PAI-1 or heterologous HBV promoter as used herein 
but not limited to that, to prepare TGF-fi responsive expression 
vectors that provide for the quantitation of inducing TGF-fi. 

While specific expression vector constructs having the 
preselected regulatory regions as described herein were 
prepared for use in this invention, also contemplated are 
expression vectors having regulatory regions with TGF-fi 
response elements that are either longer, shorter, tandemly 
arranged, reversed, permutations thereof and the like 
operatively ligated to a selected promoter. Moreover, in 
addition to the construction methods detailed herein, other 
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methods of preparing P 19Luc-derived expression vectors having 
TGF-S response elements and promoters are familiar to one of 
ordinary skill in the art of vector construction and are 
described by Ausebel, et al . , In Current Protocols in Molecular 
5 Biology, Wiley and Sons, New York (1993) and by Sambrook et 
al.. Molecular Cloning: A Laboratory Manual, Cold Spring 
Harbor Laboratory, 1989. 

2) Prpnaratinn of Ex pression Vector P150QLUC 
10 One expression vector of this invention, 

designated pl500Luc, was constructed from pl9Luc and a cosmid 
containing the PAI-1 promoter in which TGF-E response elements 
are located. To prepare pl500Luc, a 1547 base pair (bp) Kpn I- 
Eco RI fragment of the PAI-1 promoter was obtained from a 
15 . cosmid containing the entire PAI-1 gene (Loskutoff et al., 
Riorhem. . 26:3763-3768 (1987), the disclosure of which is 
hereby incorporated by reference, and was cloned into the Kpn I 
and Eco RI sites of pUC19, a plasmid available from American 
Type Culture Collection, Rockville, MD with the ATCC Accession 

20 Number 37254, to create a vector designated pUCEK19 . The 

fragment contained the 1442 bp TGF-S response element (SEQ ID 
NO 11) from the PAI-1 promoter that corresponded to nucleotide 
position -1481 and extended to the nucleotide position -40 
continuous with a 115 bp minimal (non-TGF-S responsive) PAI-1 

25 promoter sense strand sequence (SEQ ID NO 18) corresponding to 
nucleotide position -39 ending with an E, coli DNA polymerase 
filled- in Eco RI site at nucleotide position at +76 as 
described by Bosma et al.. »L piol. Chem,, 263:9129-9141 
(1988). The entire 15,867 bp PAI-1 gene sequence including 

30 significant stretches of DNA that extend into its 5'- and 3'- 
flanking DNA regions was described by Bosma et al., J, Biol , . . 
Chem. . 263:9129-9141 (1986), and is available in the 
G enBank™ / EMBL Data Bank with accession number (s) J03764. 
To create a sensitive reporter gene system with a 

35 regulatory region having the 1442 TGF-fi response element of the 
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PAI-1 promoter contiguous with the minimal PAI-1 promoter, the 
pUCEKl 9 plasmid prepared above was then digested with Kpn I and 
Eco RI and the isolated fragment was then ligated into the 
multiple cloning site of a similarly digested pl9Luc. The 
resulting vector was designated pl500Luc. 

3> Preparation of Expression Vector p800Iaic 

Another vector, designated p800Luc, was prepared 
for subsequent constructon of p800neoLuc as described below. 
The p800Luc plasmid, having a deletion in the 5' end of the 
PAI-1 construct so that the 5' end began with the -800 
nucleotide in the native PAI-1 promoter, was prepared by 
digesting the PAI-1 -gene-containing cosmid described above with 
Hind III and Eco RI . The actual Hind III-Eco RI digest of the 
PAI-1 promoter resulted in a fragment that corresponded to 
nucleotides -799 to +71 bp in the PAI-1 promoter that was 
subsequently ligated into a similarly digested pl9Luc vector 
forming a PAI-1 region extending from nucleotide -800 to +76. 
The resulting p800Luc plasmid retained all the features of 
pl9Luc with the exception of the insertion of the PAI-l-derived 
regulatory region having a TGF-S response element and a 
promoter . 

The restriction fragments described to prepare pl50 0Luc 
and p800Luc had an identical 3* end (an Eco RI site at +71 
nucleotide of the PAI-1 promoter) and a different 5* end. The 
vectors, plSOOLuc and p800Luc, were used for transient 
transformations as they lacked a selectable marker gene. The 
pl500Luc plasmid was also used to 'prepare stable 
transformations with a second vector as described in Example 
1C. In addition, the p800Luc served as the starting cloning 
construct for the preparation of p800neoLuc as described below. 
The TGF-E response element in the -800 to +76 PAI-1 promoter 
region began at -800 and ended at -40, the nucleotide sequence 
of which is listed in SEQ ID NO 12. The remaining nucleotides 
comprised the non-TGF-g responsive minimal promoter in this 
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PAI-1 fragment are listed in SEQ ID NO 18. 

4) Preparati on of Cloning Vector P39LUC 

An expression vector, designated p39L*uc, having 
5 a promoter for activating transcription of the luciferase gene 
while lacking TGF-fi response elements, thereby lacking 
responsiveness to TGF-E, was prepared as described by Keeton et 
al., J. Biol . Chem. , 266:23048-23052 (1991), A fragment of the 
PAI-1 promoter (i.e., between -39 and +76, which had been 

10 determined in the TGF-S assay as described in Example 3A to 
have low basal activity and only minimal response to TGF-S 
(average induction of 2.7-fold), was used as a minimal promoter 
in the constructs for use in quantifying the amount of TGF-E in 
a test liquid sample. Since the minimal promoter sequence 

15 conferred only a minimal background response to TGF-fi as shown 
in Example 3A, the minimal PAI-l-derived promoter is also 
referred to as being " non-TGF-S responsive" . 

Briefly, the p800Luc vector was linearized by digestion 
with Hind III followed by 5' digestion of PAI-1 promoter with 

20 Bal-31 slow exonuclease (International Biotechnologies, New 
Haven, CT) as described by Keeton et al . , J. Biol , Chem, , 
266:23048-23052 (1991). The digestion was allowed to proceed 
until the -39 nucleotide position of the PAI-1 promoter was 
reached. Thereafter, the linearized and Bal-31 digested 

25 plasmid was ligated with T4 ligase forming a double -stranded 
circular vector designated p39Luc. 

The resultant expression vector, into which TGF-& response 
elements were subsequently ligated as described in Example 1C, 
contained the PAI-1 minimal promoter nucleotide sequence 

30 corresponding to -39 to +76 of the promoter as listed in SEQ ID 
NO 18. This minimal promoter was operatively linked to and 
continuous with the structural region that contained the 
firefly luciferase gene present in the vector. Since the 
p3 9Luc cloning vector was derived from p800Luc which itself was 

35 derived from pl9Luc, the remaining elements and features of the 
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vector were retained unchanged from pl9Luc. The 6229 bp sense 
strand nucleotide sequence of the Eco Rl-linearized p3 9Luc 
vector is listed in the SEQ ID NO 23. 

The p39Luc cloning expression vector is also obtained by 
preparing a double- stranded olignucleotide sequence 
corresponding to the sequence in SEQ ID NO 18 and ligating it 
into the Hind III/Eco RI multiple cloning site of pl9Luc. The 
overhang from the Hind III /Eco RI digests in the pl9Luc vector 
is first digested with mung bean nuclease and followed by 
ligation with the blunt-ended double- stranded oligonucleotide 
promoter. Other construction methods are well known to and 
easily accomplished by one of ordinary skill in the art. 

The p3 9Luc vector was useful for operatively ligating 
regulatory regions that contained TGF-2 response elements 
resulting in an expression vector that was responsive to DNA- 
binding proteins, the result of which was induction of the 
transcription and translation of the indicator molecule, 
lucif erase. TGF-fi responsive expression vectors for use in 
practicing this invention having TGF-E response elements other 
than those specified herein are readily constructed through the 
use of either pl9Luc or p3 9Luc starting cloning expression 
vectors . 



5) Preparation of Plnninn V ector HRVT.nr 

To create expression vectors having heterologous 
non-TGF-fi responsive promoters instead of having the PAI-1- 
derived minimal promoter described above, a minimal promoter 
construct derived from the Hepatitis B viral promoter (HBV) was 
selected. This promoter contained the nucleotide sequence from 
-188 to +145 of the Hepatitis B promoter and showed only a 4- 
fold induction in response to TGF-J5. The sense strand of the 
double- stranded nucleotide sequence of the HBV minimal promoter 
is listed in SEQ ID NO 19. This promoter corresponded to the 
nucleotide sequence from -188 to +145 of the Hepatitis B 
promoter and showed only 4-fold induction in response to TGF-fc. 
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The 6464 bp sense strand nucleotide sequence of the Eco RI 
linearized pHBVLuc vector is listed in the SEQ ID NO 25. 

6) EraaaxaiLifi i] ^ f Ezszeszien vector 

5 pftOOneoLuc 

For preparing an expression vector for use in 
stable transformations, the neomycin-resistance conferring gene 
from pMAMneo (Clontech, Palo Alto, CA) was inserted into the 
p800Luc vector containing -800 to +76 of the 5' end of the 
10 human PAI-1 gene followed by the firefly lucif erase gene. As 
shown in Figure 1, pBOOLuc prepared above was first digested 
with Acc I, repaired to blunt ends with the Klenow fragment of 
DNA polymerase I, and then was isolated. The pMAMneo plasmid 
was digested with Sal I and Eco RI then blunt -ended with 
15 Klenow. The neomycin-resistance gene containing fragment was 
then isolated and had the 4302 bp sense strand nucleotide 
sequence listed in the Sequence Listing in SEQ ID NO 20. The 
linearized p800Luc and neomycin-resistance fragment were 
ligated, and one clone with the insert in the correct 
20 orientation was selected by restriction mapping and designated 
P 800neoLuc. The entire Eco Rl-linearized 11293 bp nucleotide 
sequence of the sense strand of the double- stranded pBOOneoLuc 
vector is listed in the Sequence Listing in SEQ ID NO 1 . DNA 
sequencing was performed by a modification of the dideoxy 
25 chain-termination procedure with a Sequenase kit (United States 
Biochemical; Cleveland, OH). This clone, purified from large 
scale plasmid preparations via CsCl 2 gradients, was used for 
subsequent transf ections . 

Since the p800neoLuc cloning vector was derived from 
30 pSOOLuc which itself was derived from P 19Luc, the remaining 
elements and features of the vector were retained unchanged 
from'pl9Luc. The p800neoLuc vector thus contained the 
neomycin-resistance conferring gene providing for stable 
transf ormants . The p800neoLuc vector also contained an 
35 operatively ligated regulatory region that contained TGF-E 



WO 95/19987 



PCTYUS95/01153 



.-58- 

response element in the sequence corresponding to -800 to -40 
of the PAI-1 promoter resulting in an expression vector that 
was responsive to TGF-E. With this expression vector 
construct, the induced activation of the transcription and 
translation of the indicator molecule, luciferase, was obtained 
further allowing for the quantitation of the amount of TGF-£ 
responsible for activating gene expression. 

7 > Preparation of Cloning Ve ctor p3 9nftr>T,iir 

To create an expression vector useful for 
constructing TGF-S responsive vectors that resulted in stably 
transformed cells, the p39Luc cloning vector prepared above was 
linearized as described above for p800Luc and ligated with the 
neomycin-resistance conferring gene fragment from pMAMneo. The 
construction of the vector was performed as described in 
Example 1A6) . The resultant p3 9neoLuc cloning expression 
vector had the Eco Rl-linearized 10533 bp sense strand 
nucleotide sequence listed in the SEQ ID NO 22. Regulatory 
regions containing TGF-E response elements were operatively 
ligated 5' to the minimal promoter sequence of the p39neoLuc as 
described in Example 1C for the preparation of plasmids for 
transient transformation. 

8) PreP&ration of Cloning Vpr tor nHBVnpnT.nr 
To create an expression vector useful for 
constructing TGF-E responsive vectors with a heterologous 
promoter for stably transforming cells, the pHBVLuc cloning 
vector prepared above was linearized as described above for 
p800Luc and ligated with the neomycin-resistance conferring 
gene fragment from pMAMneo. The construction of the vector was 
performed as described in Example 1A6) . The resultant 
pHBVneoLuc cloning expression vector had the Eco Rl-linearized 
10768 bp sense strand nucleotide sequence listed in the SEQ ID 
NO 24. Regulatory regions containing TGF-S response elements 
were operatively ligated 5' to the minimal promoter sequence of 
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the pHBVneoLuc as described in Example 1C for preparing 
plasmids for transient transformation. 

9) Prpnarat.i nn of r>T SOOneoLUC , 
5 P Rnn/636n ^r.T.nr. r?56neoLuc, 

? fi7AnpnT.iir. r/7^nonT,nr and D732neoLUC 
Expressio n Vectors 

The pl500Luc vector prepared above is similarly 
ligated with the neomycin-resistance gene from pMAMneo to form 
10 pl500neoLuc. Other PAI-1 -promoter containing expression 

vectors lacking the neomycin resistance gene, p800/63 6Luc, 
p56Luc, p674Luc, p743Luc and p732Luc, containing smaller TGF-S 
response elements were prepared as described in Example 1C. To 
create the corresponding neomycin-resistance expression vectors 
15 • for stably transforming recipient cells, the neomycin- 
resistance gene from pMAMneo is separately ligated with each of 
these five vectors to form expression vectors used for 
generating stable cell transformations. The five resultant 
vectors having the neomycin-resistance gene inserted are 
20 designated p8 00/63 6neoLuc (10697 bp), p56neoLuc (10549 bp), 
P 674neoLuc (10558 bp), p743neoLuc (10569 bp) and P 732neoLuc 
(10558 bp) and have the respective complete nucleotide 
sequences of the sense strand from the Eco RI- linearized 
double- stranded vectors in SEQ ID NOs 2-6. 
25 Depending on the vector into which the PAI-1 promoter 

fragments were cloned, the designated names either had "Luc" 
alone or "neoLuc" respectively for vectors lacking the neomycin 
(neo) selectable marker gene or containing it. In addition, 
the plasmids were further designated by the 5' end of the PAI-1 
30 TGF-S response element. For example, five plasmids with 

shorter TGF-S response elements were thus named p800/636neoLuc, 
p56Luc, p674Luc, p743Luc and p732Luc. 

As with all the expression vectors of this invention, the 
operative elements from the original cloning vector pl9Luc, 
35 from which the vectors were all derived, were retained. 
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The above neomycin-resistance containing expression 
vectors were then used in the TGF-J5 assay method as described 
in Example 3 following transformation of host recipient cells. 

B. Expression Vectors for Co-Trans formation of 
TGF-S Responsive Vectors and a SelectablP 
Marker Vector for Stable Transformation 
Stably transformed Hep3B cells were also obtained as 
described in Example 2B below through the use of co- 
transf ections of a TGF-fi responsive vector lacking a selectable 
marker gene of this invention, specifically the p!500Luc 
prepared in Example 1A3), with a selectable marker vector, 
RSVneo, available from American Type Culture Collection (ATCC) , 
Rockville, MD, ATCC Accession Number 37198. The stably 
transformed cell line containing' plasmid plSOOLuc, designated 
LUCI, was deposited with the ATCC on or before December 16, 
1993 and was assigned the ATCC Accession Number CRL 11508. 

c - Egression Vectors for Tra n s ient T r ansformation 

Additional TGF-S responsive expression vectors were 
prepared for use in this invention. In the vectors prepared as 
described herein, the TGF-E response elements having a smaller 
length, thereby providing responsiveness to TGF-fi with reduced 
or absent responsiveness to other growth modulators, were made 
by either restriction digestion of the PAI-1 promoter or 
synthesizing double -stranded blunt-end oligonucleotides. The 
oligonucleotide sequences corresponded to preselected regions 
of the PAI-1 promoter sequence. The resultant TGF-£ response 
elements present within a regulatory region were then 
directionally ligated into p39Luc or p39HBV. 

The regulatory region from the PAI-1 promoter 
corresponding to nucleotide position -800 up to and including 
-636 was obtained by restriction digestion and had the 
following sense strand sequence: 

5 ' AAGCTTACCATGGTAACCCCTGGTCCCGTTCAGCCACCACCACCCCACCCAGCACACCTCC 
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/^CCTCAGCCAGACAAGGTTGTTGACACAAGAGAGCCCTCAGGGGCACAGAGAGAGTCTGGAC 
ACGTGGGGAGTCAGCCGTGTATCATCGGAGGCGGCCGGGCA3 ' (SEQ ID NO 13). 
The additional selected regions for preparing oligonucleotides 
included the following sense strand nucleotide sequences with 
5 the. indicated nucleotide positions as present in the intact 

PAI-1 promoter: 1) promoter nucleotide position -56 up to and 
including -41: 5 1 AGTTCATCTATTTCCT3 ' (SEQ ID NO 14); 3) 
promoter nucleotide position -674 up to and including -650: 
5 • GTGGGGAGTCAGCCGTGTATCATCG3 ' (SEQ ID NO 15) ; 4) nucleotide 
10 position -743 up to and including -708: 

5 ' CTCCAACCTCAGCCAGACAAGGTTGTTGACACAAGA3 ' (SEQ ID NO 16) ; and 5) 
nucleotide position -732 up to and including -708: 
5 ' GCCAGACAAGGTTGTTGACACAAGA3 ' (SEQ ID NO 17). The 
complementary sequences to each of the sense oligonucleotide 
15 sequences were also synthesized to allow for the formation of 
double- stranded oligonucleotides for ligation 5' to the PAI-1 
minimal promoter sequence containing the TATA box. 

The resulting double-stranded oligonucleotides were then 
separately operatively linked to the -39 position of this 
20 minimal promoter sense strand sequence listed in SEQ ID NO 18 

present in the expression vector, p39Luc, prepared as described 
in Example 1A4) . The sequences were confirmed by double-' 
stranded sequencing methods. 

The resulting five plasmids with shorter TGF-S response 
25 elements were thus named p800/636Luc, p56Luc, p674Luc, p743Luc 
and p732Luc. The plasmids, p56Luc, p674Luc, p743Luc and 
p732Luc, have the respective complete sense strand nucleotide 
sequences beginning with the middle T of the Eco RI site as 
previously described listed in SEQ ID NOs 7-10. The plasmids, 
30 p674Luc, p743Luc and p732Luc, were deposited with ATCC as 
described in Example 5 and respectively assigned the ATCC 
Accession Numbers 75627,. 75628 and 75629. 

In similar procedures, five plasmids having a heterologous 
hepatitis B viral promoter, H3V, instead of the PAI-1 minimal 
35 promoter were prepared with the shorter TGF-S response 
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elements, p800/636Luc, p56Luc , p674Luc / p743Luc and p732Luc. 
The HBVXiiic cloning expression vector was prepared as described 
in Example 1A4 ) . The TGF-S response elements were ligated into 
linearized HBVLuc, prepared as described in Example 1A5) , to 
5 form TGF-S response element-containing plasmids lacking the 
neomycin-resistance-conf erring gene . 

Furthermore, as previously mentioned, the cloning vector 
constructs, pl9Luc and p39Luc, provide for the operative 
linking of preselected regulatory regions with preselected 
10 promoters, both of which are not limited to the specific 
constructs described herein and above. Additional TGF-fi 
response elements in varied lengths and arrangements along with 
promoters that provide for the transcription of the reporter 
gene are contemplated for use in this invention. 

15 

2 . Transformation of Eucarvotic Cel ls with Expression 
Vectors Containing TGF-fi Response Elements 

a. Recipient guceryotic Cells 

To identify the cell types most responsive to TGF-E 

20 in which to transfect the TGF-S responsive expression vectors 
for use in assaying the amount of TGF-S, the vectors prepared 
in Example 1 were transfected as described in Example 2B and 2C 
into recipient cell lines including mink lung epithelial cells 
(MLE cells) (ATCC CCL 64), HeLa cells (ATCC CCL 2), Chinese 

25 hamster ovary (CHO cells) (ATCC CCL 61), GM7373 (chemically 
transformed metal bovine aortic endothelial cells or BAEs) 
(NIGMS Human Genetic Mutant Cell Repository, Camden, NJ) , Hep3B 
(ATCC HB 8064) and NIH 3T3 cells (ATCC CRL 1658). 

30 B. Stefrle Transforation 

For preparing stably transfected cells for use with 
expression vectors containing the pMAMneo construct prepared in 
Example 1A, transf ections of mink lung epithelial cells 
(hereinafter referred to as MLE cells to distinguish from the 

35 TGF-S proliferation assay called MLEC) were performed. The MLE 
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cells were seeded at 7 x 10 s cells/100 mm dish for 24 hours at 
which point they were transfected with the PAI/L construct, 
p800neoLuc, by calcium phosphate precipitation as described by 
Wigler et al - , ElfiC ■ Natl. Acad. Sci ■ , USA, 76:1373-1376 
5 (1979) . Twenty-four hours after transf ection , the medium was 
replaced and supplemented with 400 jig/ml of Geneticin. The 
resistant cells were expanded in mass culture or cloned by 
limiting dilution for further experiments. Following 
selection, transfected MLE cells were maintained in DMEM 

10 containing 10% fetal calf serum and 250 Jig/ml Geneticin (G-418 
sulfate) (Gibco BRL, Grand Island, NY) . 

Stable transformations are also performed as described 
above with the expression vectors, p800/636neoLuc, p56neoLuc, 
p674neoLuc, p743neoLuc and with p732neoLuc, all of which are 

15 . prepared as described in Example 1A. 



C. Stable Tr ansformation Obtained bv CP~ 
transfection of Cells 

For transf ecting 6 wells, 15 micrograms {\ig) of 
20 pl500Luc expression vector prepared in Example 1A2) that did 

not have a neomycin-resistance gene was admixed with 3 [ig of a 
plasmid encoding the neomycin selectable marker gene driven 
from a respiratory syncytial virus promoter, RSVneo. The 
RSVneo plasmid is available from ATCC with ATCC Accession 
25 Number 37198. Hep3B cells at a concentration of 6 X 10 5 

cells/well were seeded as described above in Example IB for 24 
hours at which point they were transfected with the PAI/L 
construct, pl500Luc, by calcium phosphate precipitation 
followed by selection with Geneticin. The resultant cell line 
30 stably transformed with pl500Luc, designated LUCI, was 

deposited with ATCC on December 16, 1993 and was assigned the 
ATCC Accession Number CRL 11508. 



35 



D. Transien t Transformation 

For preparing transiently transformed cells 
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containing TGF-S responsive expression vectors lacking the 
neomycin resistance gene prepared as described in Example 1C, 
Hep3B human hepatoma cells obtained from ATCC (ATCC Accession 
Number HB8064) were maintained in DMEM/HAMs F-12 (Whittaker 
Bioproducts, Walkersville, MD) supplemented with 10% fetal 
bovine serum (Hyclone Laboratories, Logan, UT) , glutamine, 
sodium pyruvate, non-essential amino acids and 
penicillin/streptomycin (Whittaker) . For transfection 
experiments, semiconf luent cells in 6-well (10 cm 2 per well) 
tissue culture plates (Corning Inc., Corning, NY) were washed 
twice with serum free media (DMEM/F-12 ) then incubated in serum 
free media. Separate mixtures (50 ul/well) of lipofectin 
(GIBCO, Grand Island, NY) at a concentration of 12.5 ^lg/well 
and DNA vector constructs prepared in Example 1A-1C at a 
concentration of 2.5 ug/well each in water were added to the 
cell-containing wells and the plates were incubated for 18 
hours. After lipofection, plates were incubated an additional 
24 hours in the absence or presence of 1 ng/ml TGF-E provided 
by Berlix Biosciences, South San Francisco, CA. The monolayers 
were then washed followed by extraction into 0.25% Triton X- 
100. Each construct was tested with at least 2 independent DNA 
preparations in order to rule out any effects related to 
differences in DNA preparation. For each experiment, two 
independent transf ections were performed with every construct. 

3 - Method for Quant ifvino thP Amount- nf TGF-ft in * 

Sample 

A. The TGF-fi Assay Mpfhon 

The p800neoLuc construct stably transfected into 
Hep3B cells was used in the initial characterization of the 
assay method as described herein. TGF-E measurement assays 
performed with cells transiently transformed with the remaining 
expression vectors containing TGF-S response elements are 
presented in Example 4 . 

The TGF-S assay allows for the quantification of the 
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amount of TGF-S in a liquid sample, either containing purified 
TGF-B or TGF-& in a heterogeous admixture. The assay system 
provides for the quantification of TGF-S through the expression 
of an indicator polypeptide, such as lucif erase. When TGF-S 
receptor-bearing cells, transfected with a TGF-S responsive 
expression vector of this invention, are exposed to TGF-E, the 
activation of the TGF-fi response element in the vector results 
in the concomitant expression of lucif erase. The resulting 
expressed luciferase is isolated then measured as described 
herein. The measured luciferase resulting from activation by 
TGF-£ in the test liquid sample is then compared to a 
standardized reference curve. 

This reference curve is obtained from parallel assays 
performed by exposing similarly transfected cells to a range of 
15 known measured amounts of TGF-S, one or more of the known TGF-E 
isoforms. The resulting expressed luciferase is then 
determined in a luminometer. A reference curve is then 
generated by plotting the measured amount of expressed 
luciferase against the known range of inducing amounts of TGF- 
20 &. The amount of unknown TGF-fi in the test liquid sample is 
then determined by extrapolating the measured amount of test 
luciferase to the reference curve. The use of standard curves . 
in quantifying the amount of protein in a liquid sample in 
general has been described by Lowry et al., J, Biol, Chenu > 
25 193:265-275 (1951), the disclosure of which is hereby 

incorporated by reference. As shown in the Examples herein, 
the TGF-fi assay of this invention allows for the measurement of 
TGF-S from the expression and subsequent detection of an 
indicator polypeptide from a concentration range from less than 
30 5 picograms/ml (pg/ml) equivalent to 0.2 pM to 10 ng/ml 

equivalent to 0.4 nM. The dose-dependent response is linear 
between 0 .2 pM up to 30 pM and even up to 100 pM depending on 
the assay conditions. 

An additional aspect of the assay for quantifying TGF-fi in 
35 complex solutions was the use of neutralizing anti-TGF-S 
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monoclonal antibodies admixed with the test liquid sample in 
assays run in parallel to untreated test liquid samples as 
described in Example 3B. These control assays are used to 
determine if other molecules are present in the test sample 
that can affect the assay through either inhibition or 
activation of other regions of the truncated PAI-1 promoter. 
For example, conditioned medium obtained from cell cultures and 
body fluids contain growth factors and DNA binding proteins 
that function as transcriptional activators or inhibitors. If 
a corresponding response element for an additional non-TGF-£ 
activator or inhibitor is present in the expression vector, the 
binding of that molecule to the response element may cause 
enhanced or diminished expression of the indicator polypeptide. 
By antibody neutralization of the TGF-fi in the test sample, any 
residual measured luciferase can then be ascribed to non-TGF-£ 
activation . 

The shorter TGF-S response elements used in the expression 
vector systems of this invention, even including the longer 
p800neoLuc, are less likely to have non-'TGF-S response elements 
that are bound by other DNA-binding proteins as shown in 
Examples 3C-3F. Thus, the use of parallel antibody control 
assays to allow for a determination of the amount of luciferase 
produced from only TGF-S activation is preferred when 
expression vectors having longer response elements are used. 
Moreover, while the TGF-S assay is not isoform specific, using 
the appropriate standard reference curves and parallel assays 
with neutralizing antibodies to the various TGF-S species 
allows for quantification of unique TGF-& isoforms. 

In the assays described herein, the various following 
reagents including their sources are listed: recombinant human 
TGF-fil (rTGF-Sl) (gift from Berlix Biosciences, South San 
Francisco, CA) ; rTGF-£2 and neutralizing monoclonal antibodies 
against TGF-fil, TGF-S2 and TGF-S3 (Genzyme, Cambridge, MA) ; 
rTGF-£3, recombinant human interleukin-lalpha (rIL-lalpha) and 
recombinant human platelet-derived growth factor-BB (PDGF-BB) 
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(R&D Systems, Minneapolis, MN) ; recombinant human basic 
fibroblast growth factor (bFGF) (Synergen Inc., Boulder, CO); 
epidermal growth factor (EGF) from mouse submaxillary glands 
(Boehringer Mannheim Biochemical s, Indianapolis, IN); 
dexamethasone, retinoic acid, and plasmin (Sigma Chemical Co., 
St. Louis, MO); thrombin (Armour Pharmaceutical Co., Kankakee, 
IL) ; and hematopoetic factors granulocyte-colony stimulating 
factor (GCSF) , granulocyte-macrophage-colony stimulating factor 
(GMCSF), stem cell factor, and IL-3 (Amgen, Thousand Oaks, CA) . 

The TGF-S quantification assay of this invention was 
performed as follows: 1.6 x 10* stably transfected MLE cells 
per well plated in 96 well tissue culture dishes were allowed 
to attach for 3 hours at 37°C in a 5% C0 2 incubator. The 
medium was replaced with the test sample containing unknown 
15 quantities of TGF-S, DMEM, 0.1% BSA (DMEM-BSA) containing rTGF- 
Sl, rTGF-S2 , rTGF-S3 , IL-lalpha, PDGF-BB, bFGF, or EGF for 14 
hours at 37°C. Time courses of exposure to the samples were 
performed as shown for optimizing the assay as shown below. 
However, in general, approximately 24 hours after additions of 
20 the sample to the transfected cells, the cells were observed 
under phase contrast microscopy. At least in one vector- 
transfected cell line, Hep3B cells, the presence of TGF-S in 
quantities at least or greater than 0.1 ng/ml TGF-S in the 
sample was detected visually by the change of morphology and 
25 density of the cell population. The untreated cells remained 
organized with cell size decreasing upon confluence until the 
cell borders were no longer visible. In the presence of TGF-S, 
the untreated cell density was never attained and the cells 
were larger, flatter and less organized. 
30 Following visual inspection, cell extracts were prepared 

and assayed for luciferase activity using the enhanced 
luciferase assay kit (Analytical Luminescence, San Diego, CA) 
as per the manufacturer's illustrations . Treated cells were 
first washed twice with 2 ml phosphate-buffered saline (PBS) 
35 without Ca+ + and Mg ++ and then extracted with 100 ul of 0.25% 
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Triton-X 100 (cell lysis buffer, Analytical Luminescence) . The 
plates were gently shaken until the monolayer detached from the 
plastic. The plates were then placed on a rotator at room 
temperature for 20 minutes. 

Eighty ul of the resultant lysates were transferred to a 
Microlight 1 96-well plate (Dynatech Laboratories Inc., 
Chant illy, VA) and were analyzed using an ML1000 luminometer 
(Dynatech) with 100 ul injections of both Substrates A and B 
(Analytical Luminescence) . Luciferase activity was reported as 
relative light units (RLU) as measured by the light generated 
over a ten second period. All assays were performed in 
triplicate. Error bars in the collected data represent the 
standard error of the mean of the samples. 

To quant itate the amount of TGF-S inducing the measured 
amount of luciferase from liquid samples, reference curves were 
prepared from parallel assays performed by exposing similarly 
trans fected cells to a range of known measured amounts of TGF- 
S, one or more of the known TGF-S isoforms. Serial dilutions 
of the control TGF-S concentrations were prepared from a 1 
nanomolar (nM) concentration down to 0.078 picomolar (pM) . The 
TGF-S assay was performed for each serial dilution and the 
resulting expressed luciferase was then determined in a 
luminometer. A reference (standard) curve was then generated 
by plotting the measured amount of expressed luciferase against 
each of the known concentrations of inducing amounts of TGF-S. 
The amount of unknown TGF-S in the test liquid sample was then 
determined by extrapolating the measured amount of test 
luciferase to the reference curve. 

B. Sensitivity of the TGF-S A ssay Method 

To identify the cell type most responsive to TGF-S 
for use in the methods of this invention, the p800neoLuc 
construct prepared in Example 1A was stably transfected as 
described in Example 2B into a variety of cell lines including 
MLE cells, HeLa, Chinese hamster ovary (CHO) , GM7373 cells 



WO 95/19987 



PCT/US95/01153 



-69- 



10 



(chemically transformed fetal bovine aortic endothelial cells 
obtained from the NIGMS Human Genetic Mutant Cell Repository, 
Camden, NJ) and NIH 3T3 cells. After treatment of the 
transfected cell lines with recombinant ly -produced TGF-£1, 
designated rTGF-£l, the cell lysates were assayed for 
lucif erase activity and protein content . There was a linear 
relationship between the luciferase activity and the protein 
content of the cell lysates between 0.7 and 14 ng for all of 
the cell lines. Nontransf ected parental cells demonstrated no 
detectable luciferase activity. Of the various cell lines, the 
transfected MLE cells demonstrated the greatest sensitivity to 
TGF-£. After cloning the transfected MLE cells by limiting 
dilution, cells from clone 32 (C32) were found to be the most 
sensitive and were used for all subsequent assays. 
15 c32 cells were sensitive to rTGF-£l, £2 and £3 in the 

picomolar (pM) to the nanomolar (nM) range as evidenced by 
increased luciferase activity in relative light units (RLU) as 
shown in Figure 2A. All three isoforms, rTGF-£l, rTGF-£2 and 
rTGF-£3, respectively graphed as closed squares, closed circles 
20 and closed triangles, demonstrated good dose dependant 

responses particularly at low TGF-E concentrations (<4 pM: 100 
pg/ml) where the responses were essentially linear (Figure 2B) . 
rTGF-£3 was the most potent inducer of luciferase activity 
consistent with the observation that MLE cells were most 
25 sensitive to this isoform of TGF-£3 as described by van 

Zonneveld et al . , Proc , NfrU. hS&L Sci . . USA, 85:5525-5529 
(1988) (see also Figure 6 as described in Example 3E) . 

To further assess the dose -dependent responsiveness of 
luciferase activity by TGF-S induction, the TGF-£ assay was 
30 performed with 8 pM of r TGF -SI, rTGF-£2 or rTGF-£3 in DMEM-BSA 
in the presence (partially filled squares) or absence (open 
squares) of 100 Hg/ml of anti-TGF-fil, anti-TGF-£2 or anti-TGF- 
£3 monoclonal antibodies (Genzyme Corp., Cambridge, MA). As 
shown in Figure 2C, the induction of luciferase activity by 
35 rTGF-£l, rTGF-£2 and rTGF-£3 was inhibited by the addition of 
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rTGF-El, rTGF-£2 and rTGF-£3 neutralizing monoclonal antibodies 
as compared to the baseline induction obtained when using 
medium alone (filled squares). 

The effects of cell culture medium, cell density and assay 
5 incubation time on the sensitivity of the TGF-S assay was also 
assessed. To test the effects of cell culture medium, the TGF- 
6 assay was performed using increasing concentration of rTGF-£l 
in DMEM (closed squares), alpha-MEM (closed circles), CMEM 
(Eagles medium supplemented with nonessential amino acids; 

10 closed triangles), or RPMI-1640 (closed diamonds). All media 
contained 0.1% BSA. The quantification of TGF-& in test 
samples was accomplised in the TGF-£ assay in all tested media 
as shown in Figure 3A, although samples assayed in DMEM yielded 
the greatest lucif erase activity. 

15 The effect of different cell plating densities on the 

induction of luciferase activity by rTGF-Sl were also examined 
when transfected cells were maintained in the presence of DMEM. 
For this assay, increasing concentrations of rTGF-Sl in DMEM 
and 0.1% BSA were measured using 3.2 X 10 4 (closed squares), 

20 1.6 X 10 4 (closed circles), or 0.2 X 10 4 (closed triangles) C32 
cells /well after a three hour attachment period. The test 
samples were maintained with the transfected cells for 14 hours 
prior to assaying for luciferase activity. The results graphed 
in Figure 3B show that 1.6 x 10 4 cells/well were found to yield 

25 the best overall results. Cell densities greater than 1.6 x 
10 4 cells/well decreased the sensitivity of the assay at low 
TGF-B concentrations and did not significantly increase 
sensitivity at higher TGF-S3 levels. Decreasing the 
concentration of cells to 0.8 x 10 4 cells/well increased the 

30 sensitivity at low TGF-S3 levels (Figure 3D (inset in Figure 
3C) but decreased sensitivity at higher TGF-fi concentrations. 

Unlike the traditional MLEC assay where the density of the 
cells prior to plating affects the sensitivity, there was 
little or no difference whether the cells were 70% confluent, 

35 confluent or 1 day post confluent prior to plating for the TGF- 
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£ assay. The cell attachment and incubation times, however, 
did affect the sensitivity. When C32 cells were plated for 2, 
3 or 4 hours prior to the addition of samples, a 3 hour plating 
time appeared to be optimal. Shorter plating times decreased 
5 sensitivity, whereas longer times had little effect on the 
subsequent assay. 

Incubation time with the sample also affected the assay. 
After a three hour attachment period, 1.6 X 10 4 C32 cells were 
incubated with various concentrations of rTGF-Bl ranging from 0 

10 to 50 pM for 6 (closed squares), 14 (closed circles) or 22 
hours (closed triangles) prior to assaying for lucif erase 
activity as shown in Figure 3C . Incubation times of 12-14 hours 
were found to give the best results over the widest 
concentration range. The sensitivity of cells incubated for 6 

15 . hours was not as great at higher TGF-S1 concentrations, whereas 
the sensitivity of cells incubated for 22 hours was decreased 
at low TGF-E1 concentrations. There also appeared to be a 
slight decrease in sensitivity to TGF-S as the cells were 
repeatedly passaged (>30) . This phenomenon was observed for 

20 the MLEC assay as well . 

c. snprifici rv of thp tgf-S Assay Method 

After examining the sensitivity of the assay, 
specificity of the TGF-fi assay was then examined. Four known 

25 inducers of PAI-1 expression, were incubated with C32 cells and 
the lucif erase activity determined. The inducers tested 
included fibroblast growth factor (bFGF) (Saksela et al, sL. 
CpII Biol. . 105:957-963 (1987)), platelet-derived growth factor 
(PDGF-BB) (Reilly et al . , >L BloJ - Chem. . 266:9419-9427 

30 (1991)), interleukin-1 alpha (rIL-lalpha) (Schleef et al . , 2^ 
Biol. Chem. . 263:5797-5803 (1988)) and epidermal growth factor 
(EGF) (Seebacher et al . , Rvn Tell Res., 203:504-507 (1992) and 
Sato et al., Fvn. QsU Res. . 204:223-229 (1993)). The assay 
was performed as described in Example 3A with DMEM-BSA 

35 containing rTGF-Sl (closed squares), recombinant human bFGF 
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(closed circles), recombinant IL-lalpha (closed triangles), 
recombinant PDGF-BB (closed triangles) or EGF (open squares) 
ranging in concentration from 0.1 to 500 pM. As seen in Figure 
4A, even at high concentrations of these factors (500 pM) , 
there was little or no induction of luciferase expression 
except by PDGF which demonstrated a slight induction. 

Additional inducers of PAI-1, dexamethasone (10 -7 M) , 
retinoic acid (1 uM), plasmin (0.1 U/ml), thrombin (1 U/ml ) , 
and the hematopoetic factors granulocyte colony stimulating 
factor (10 ng/ml; 525 pM) , granulocyte-macrophage-colony 
stimulating factor (10 ng/ml; 690 pM) , stem cell factor (50 
ng/ml; 2.7 nM) and IL-3 (10 ng/ml; 666 pM) , were also tested 
for their ability to induce luciferase expression in the assay 
method of this invention. Only plasmin and thrombin elicited 
minor elevations of luciferase activity that were inhibited by 
the addition of aprotinin or hirudin, respectively. Of the 
molecules tested in the TGF-S cell assay, only the TGF-£s 
demonstrated dose-dependent increases in luciferase expression. 

When these factors were tested in the presence of TGF-S1, 
a slightly different pattern emerged. These assays were 
performed with C32 cells maintained in DMEM/BSA containing 1 pM 
rTGF-£l (closed squares) separately admixed with each of the 
growth factors, bFGF (closed circles), recombinant IL-lalpha 
(closed triangles), recombinant PDGF (closed diamonds) or EGF 
(open squares), ranging in concentration from 0.2 to 500 pM. 
The results, graphed in Figure 4B, show that high 
concentrations (500 pM) of PDGF-BB and rIL-lalpha increased the 
luciferase ativity above that induced by TGF-S alone. bFGF had 
a similar effect that was observed at lower concentrations. 
This induction, maximal at 10 pM bFGF, was abrogated by the 
addition of bFGF neutralizing antibodies, and did not increase 
at higher concentrations (>10 nM) of bFGF. 

Because this enhancement may have resulted from a bFGF- 
mediated increase in total cell number and/or protein, crystal 
violet staining of parallel cultures and protein assays of the 



WO 95/19987 



PCTAJS95/01153 



-73- 



cell lysates was performed. The normalization of the amount of 
protein using these values, however, did not reduce the 
luciferase activity in the bFGF plus rTGF-Sl -treated cultures 
to that of cells treated with rTGF-Sl alone. Interestingly, 
uncloned transfected MLE cells were less sensitive to bFGF and 
other factors including TGF-E. 

Additional TGF-6 assays were performed using the ATCC 
deposited LUCI cell line containing the pl500Luc expression 
vector co-transfected with RSVneo as described in Example 2C to 
determine the specificity of activation of the PAI-1 promoter 
by other cell activating molecules (agents). The TGF-S assays 
were performed as described in Example 3A with the exception 
that the pl500Luc vector was used instead of the p800neoLuc 
vector. Controls in these assays included the use of two 
additional lucif erase-expressing vectors that had the 
vitronectin (VN) and respiratory synctial virus (RSV) promoters 
in place of the PAI-1 truncated promoter. The molecules used 
in the assays included the following: (the source and 
concentrations are indicated in the parentheses) 1) human 
recombinant IL-6 (Boerhringer Mannheim, Indianapolis, IN; 500 
U/ml); 2) dexamethasone (Sigma Chemical Co.; 10 -5 M) ; 3) TGFB- 
fi (Berlix Biosciences; 1 ng/ml); 4) lipopoly saccharide (LPS) 
(Sigma Chemical Co.; 1 ng/ml); 5) human recombinant alpha 
tumor necrosis factor (TNF) (Boehringer Mannheim; 100 ng/ml); 

6) human recombinant IL-1 (Sigma Chemical Co.; 50 U/ml); and 

7) thrombin (NY State Department of Health, Albany, NY; 10 
U/ml) . 

The assays were performed as indicated in Table 1 in which 
the fold induction is indicated as measured by relative light 
units of luciferase that resulted from the activation of either 
the PAI-1, VN or RSV promoters when exposed to the various 
agents . 
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Table 1 



Agents 


PAI-1 


VN 




Control 


IX 


IX 


IX 


IL-6 


2X 


15X 


IX 


Dexame t ha s on e 


IX 


IX 


IX 


11-6 + Dex. 


6X 


26X 


2X 


TGF-S 


147X 


IX 


2X 


LPS 


2X 


IX 


IX 


TNF 


0.7X 


0.3X 


0.8X 


IL-1 


0.9X 


0.3X 


IX 


Thrombin 


IX 


0.9X 


IX 



The 1500 bp PAI-1 promoter present in the plSOOLuc vector 
was slightly responsive to IL-6, LPS and a mixture of IL-6 plus 
dexamethasome . In contrast, the induction of lucif erase 
expressing in response to activation by TGF-S was 147-fold over 
that seen in the control untreated cells. Furthermore, IL-6 
and IL-6 plus dexamethasone were effective activating agents 
when used in the presence of a vitronectin promoter. None of 
the agents were significantly effective at inducing expression 
from the RSV promoter. 

These results confirm that TGF-S is the predominant 
activator of the PAI-1 promoter and that the TGF-S assay of 
this invention exhibits remarkable specificity. Thus, the 
assay is valuable in that the measurement of TGF-S that has 
been purified or even TGF-S present in unknown quantities in a 
complex solution containing many promoter-specific molecules 
can be readily determined without confounding by contaminants. 
With the added control of pre-treating the liquid samples with 
neutralizing antibodies to TGF-S isomers, the absolute amounts 
of TGF-S as well as isomer type can be determined. 

D. Effects of Serum for Quantifying TGF-S in rh^ 
TGF-fi Assev MethPQ 

To assess the effects of serum on the quantification 
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of TGF-G, TGF-fi assays were performed in the presence of DMEM- 
BSA containing rTGF-fel alone (closed squares), or with 0.5% 
(closed circles), 1% (closed triangles), or 2% (closed 
diamonds) calf serum. The rTGF-El concentrations in the assays 
5 ranged from 0 to 8 pM. As shown in Figure 4C, serum similarly 
enhanced the induction of the PAI/L construct by rTGF-£l 
similar to that by purified growth factors as shown in Example 
3C At low rTGF-Sl concentrations «1 pM) , addition of 0.5, 1 
or 2% serum had little effect on the lucif erase activity. As 
10 the rTCF-fil concentration was increased, the serum-containing 
curves were shifted upwards possibly as a result of growth 
factors such as bFGF in the serum. 

E. rv npurison Of tfaS T£Z=& ^ ^ the 

15 ^.r.p ^ S av the ^adiorpcpptcr h^v for . 

Q^ntifyipg TGF-S 

Quantification of TGF-fc in a defined media (DMEM-BSA) 
lacking growth factors or serum as demonstrated in Example 3D, 
however, is rarely found in the laboratory. For this reason, 
20 TGF-S assays were also performed in COS, BSM and BAE cell 

conditioned medium (CM) , all of which normally contain latent 
but little, if any, active TGF-S. These samples were tested 
using the TGF-fi assay method of this invention in comparison 
with the MLEC (mink lung epithelial cell tritiated thymidine 

25 uptake cell assay) . 

The TGF-S assay was performed as described in Example 3A 
with rTGF-Sl ranging in concentration from 0 to 40 pM in the 
presence of either DMEM-BSA (closed squares), COS CM (crosses), 
BSM CM (closed triangles) or BAE CM (closed circles) . To 

30 prepare conditioned medium, BAE cells were cultured in alphaMEM 
medium ( Bio-Whi t taker , Walkersville, MD) containing 5% fetal 
calf serum. BSM and COS cells were cultured in DMEM 
supplemented with 10% calf serum (Bio-Whit taker) . Conditioned 
medium was prepared by a 24 hour incubation of the indicated 

35 cells with DMEM containing 0.1% pyrogen-poor BSA 
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( weight /volume) (Pierce, Rockford, ID . All media were 
supplemented with L-glutamine (2 iriM) , penicillin G (100 U/ml) 
and streptomycin sulfate (100 jig/ml) (Irvine Scientific, Santa 
Ana, CA) . 

The MLEC assay was performed essentially as described by 
Lucas et al., In Peptide Growth Factors, Barnes et al . , Eds, 
Academic Press Inc. 198:303-316 (1991). Briefly, 100 ul 
aliguots of the samples were placed in 96-well plates 
containing 10 4 MLE cells per well in 100 ul of assay buffer 
(DMEM containing 0.25% fetal calf serum and 10 mM HEPES) . 
After 20 hours at 37°C, one ^Ci of 3 H-thymidine (6.7Ci/mmol, Du 
Pont Co., Boston, MA) in 20 |il of the assay buffer was added to 
each well, and the plates incubated an additional 4 hours. The 
cells were harvested by incubation with 100 jal of 0.25% 
trypsin/lml EDTA at 37°C for 15 minutes, transferred onto glass 
fiber filters, and placed into vials containing liquid 
scintillation solution. The amount of radioactivity was 
quantified with a Beckman LS 3801 ^-scintillation counter 
(Fullerton, CA) . 

As clearly shown by the data indicated by the unbroken 
lines in Figure 5, both BAE and BSM CM contained factors that 
stimulated thymidine incorporation in the MLEC assay 5-6 fold. 
Only at rTGF-Sl levels greater than or equal to 1 pM was the 
3 H-thymidine incorporation suppressed to a level equal to that 
of non- conditioned medium (DMEM-BSA) . In contrast, COS CM 
contained factors that strongly inhibited 3 H-thymidine 
incorporation. With all three of these CM, calculation of TGF- 
E concentration would be very difficult using 3 H-thymidine 
incorporation. In contrast, when different CM were used in the 
TGF-fi assay as indicated in Figure 5 with the data plotted with 
broken lines, there were also slight changes but these 
differences were much less significant than those seen with the 
MLEC assay. BAE CM, which contains bFGF, shifted the response 
curve to higher values. BSM and COS CM had only minor effects 
on the standard curves . 
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When bFGF (closed diamonds), EGF (open circles), FDGF-BB 
(open triangles), rIL-lalpha (open squares), and the TGF-£s 
{ rTGF-Sl (closed squares), rTGF-£2 (closed circles), and rTGF- 
£3 (closed triangles) were tested for their ability to affect 
5 3 H -thymidine incorporation into non-transf ected MLE cells in 
the MLEC assay performed as described above, more striking 
effects were observed as shown in Figure 6. The three TGF-S 
isoforms, especially TGF-S3, decreased 3 H -thymidine 
incorporation as expected. IL-lalpha and PDGF-BB had little 
10 effect, but bFGF and EGF had strong dose-dependent stimulatory 
effects on 3 H -thymidine incorporation. Such effects can make 
the MLEC assays inaccurate and difficult to analyze. 

F. ^tion of Tota l tt,f-R Levels in Activated 

15 In order to analyze total levels of TGF-S, BAE CM 

collected after 12 or 24 hours was heat treated at 80°C for 10- 
12 minutes to activate endogenous latent TGF-S as described by 
Brown et al., Fact . . 3:35-43 (1990). After cooling, the 

samples were diluted to 5, 10 or 20% of their original 
20 concentration with DMEM-BSA and were quantified using the TGF-S 
assay. TGF-S concentrations of 23.4±3.4 pM (12 hour CM) and 
122.1±16 pM (24 hours CM) were determined via comparison with a 
rTGF-S standard reference curve generated from plotting the 
detected amounts of luciferase activity that resulted from a 
25 range of predetermined amounts of TGF-E as described in Example 
3A. 

The heat-activated CM were also assayed using the highly 
specific radioreceptor assay as described by Kojima.et al . , J- 
rv»n Physiol. . 155:323-332 (1993), the disclosure of which is 

30 hereby incorporated by reference. Briefly, murine AKR-2B 
fibroblasts at 1 X 10 5 cells/well were plated in a 2 4 -well 
plate in McCoy's 5A medium (Gibco BRL) supplemented with 5% 
fetal calf serum. The following day, the cells were washed 3 
times with binding buffer (McCoy's 5A, 0.1% BSA, 25 mM KEPES at 

35 P H 7.4) and were pre-incubated in 250 ul of binding buffer for 
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1 hour at room temperature . The medium was removed, and the 
cells were incubated for 2 hours at room temperature in a 
mixture of 125 ul of binding buffer containing 50 pM 125 I-rTGF- 
Sl and an equal volume of heat-activated (80°C for 10 minutes) 
5 BAE CM or serial dilutions of cold rTGF-£l . The cells were 

washed 3 times with binding buffer, and the bound radioactivity 
was solubilized in cell lysis buffer (Analytical Luminescence) 
and was measured in a Packard Multi-PRIASl gamma counter 
(Meriden, CT) . The radioreceptor assay was sensitive between 

10 0.0004 and 2 nM rTGF-£l . 

In the radioreceptor assay, concentrations of 24±1.1 pM 
(12 hour CM) and 128±48.8 pM (24 hour CM) were calculated. The 
essentially identical results quantifying the amount of TGF-& 
in conditioned medium between the TGF-S assay described above 

15 1 and the radioreceptor assay verify the accuracy and specificity 
of the TGF-S assay of this invention. 

Thus, a highly sensitive and specific, non-radioactive 
assay for mature TGF-E has now been developed. When compared 
to the sensitive and widely used MLEC method for measuring TGF- 

20 S concentration, the TGF-S assay was more rapid, had comparable 
sensitivity, and a greater detection range. Specificity of 
this assay was also higher as evidenced by its relative 
insensitivity to factors such as EGF and bFGF which can greatly 
affect other assays. The most remarkable example of the TGF-E 

25 assay specificity was observed with COS cell CM which 
completely inhibited the MLEC assay, while having no 
detrimental effects in the TGF-E assay. 

In addition to the TGF-E assay of this invention and the 
MLEC and radioreceptor assays described herein, other assays 

30 have been used to detect mature TGF-E including anchorage- 
independent growth assays, differentiation-based assays, cell 
migration and plasminogen activity assays, radioimmunoassays 
and enzyme-linked immunosorbent assays. Although all of these 
assays can detect mature TGF-E, the low concentrations of TGF- 

35 E, generally less than 2 pM, generated in many biological 
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systems make many of them impractical without prior 
concentration of the sample that can result in large losses of 
the mature growth factor or even activation of latent TGF-fi. 
The TGF-fi assay of this invention overcomes these deficiencies 
5 by being highly sensitive and specific as well as 

nonradioactive. The specificity and sensitivity of the assay 
are the result of using a truncated PAI-1 promoter beginning at 
-800 and extending through 76 of the PAI-1 5' promoter that 
retains two regions responsible for maximal response to TGF-fi 
10 as described by Keeton et al., ,T pj nl , Chem, , 266:23048-23052 
(1991) . Use of the complete PAI-1 promoter and upstream 
elements result in decreased specificity as responsive elements 
for other molecules present in complex solutions may be 
activated or inhibited deleteriously effecting the ability to 
15 quantify TGF-fi . Moreover, the truncated PAI-1 promoter used 

above has been further fragmented to smaller more specific TGF- 
fi response elements as described in Example 4 to enhance 
specificity and increase the sensitivity of the TGF-fi assay 
method . 

20 When the TGF-fi assay is compared to the sensitive and 

widely used MLEC assay for quantifying TGF-fi concentrations, 
the TGF-fi assay was more rapid, had comparable sensitivity but 
with a greater detection range. Specificity of the assay was 
also higher as evidenced by the TGF-S ' s assay insensitivity to 

25 growth factors such as EGF and bFGF that have been shown to 

greatly effect other assays. The most striking example of the 
specificity of the TGF-S assay was observed with the COS cell 
line conditioned medium that completely inhibited the MLEC 
assay while having no detrimental effects in the TGF-S assay as 

30 shown in Figure 5. 

Although the TGF-fi assay is not isoform specific, use of 
the appropriate standard reference curves and addition of 
neutralizing antibodies to the various TGF-fi species allows for 
quantification of unique isoforms. While the TGF-fi assay of 

35 this invention is highly specific, the use of highly specific 
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neutralizing antibodies to TGF-S was used to verify that no 
other molecules were present in test liquid samples that may 
have affected the quantitation of TGF-S in the assay. 
Considering its large range and specificity, this rapid, 
5 sensitive, non-radioactive, easily performed assay is of 

invaluable use in determining active TGF-S concentrations in 
complex solutions, particularly so with the use of parallel 
assays with neutralizing antibodies to TGF-fi in complex unknown 
samples to verify that no other molecules are present that can 
10 affect the assay through either inhibition or activation of 
other regions of the truncated PAI-1 promoter. 

4 . Quantifying TGP-fc with Cells Transiently Transformed 
with Expression Vectors Having Shorter Fragments of 
15 . the Promoter Containing TGF-S Response 

Elements 

The regulation of PAI-1 by TGF-S appears to affect a 
number of biological systems and the mechanism of 
transcriptional regulation by TGF-E has been studied by a 

20 number of groups. For example, the autoinduction of the TGF-S1 
promoter suggests a feedback loop designed to amplify the 
response to TGF-S under certain conditions. This response was 
shown to involve specific AP-1 sites. AP-1 is a heterodimeric 
complex of Fos and Jun protein subunits that binds to specific 

25 DNA enhancer sites which have the consensus sequence TGASTCA 

(SEQ ID NO 26), where S can be either G or C. AP-1 is believed 
to mediate the transcriptional effects of the tumor promoting 
phorbol esters. 

In contrast to these results, the TGF-S response sequence 

30 in the promoter for type 1 collagen, has been localized to a 
sequence with homology to a nuclear factor 1 (NF-1) binding 
site. A number of different consensus sequences for NF-1 have 
been described and these include the sequences TGGN7GCCAA (SEQ 
ID NO 27), where N can be either A, C, G or T, and TGGCA {SEQ 

35 ID NO 28) . The effect of TGF-£ on the PAI-1 promoter has been 
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studied resulting in the demonstration that the responsive 
regions contain sequences with homology to the AP-1 consensus 
sequence . 

To determine the role of AP-1 in the regulation of the 
5 PAI-1 promoter in more detail and to identify smaller TGF-S 
responsive regions with the PAI-1 promoter of pSOOneoLuc 
expression vector prepared in Example 1 for use in quantifying 
TGF-E in Example 3, the effect of both TGF-S and AP-1 on the 
activity of a 25 bp fragment corresponding to the PAI-1 

10 promoter between -674 and -650 in the 5' flanking region was 
evaluated. This fragment contained one of the AP-1 like 
sequences that responded to TGF-fc. The expression vectors for 
use in assessing the requirement for AP-1, including the one 
containing the 25 bp fragment, were prepared as described in 

15 . Example 1C . 

A . r^p-.fi Art.-p^Mon of P&T-1 Promoter Fragment? 

AP-1 like sites are located within each of three 
regions of the 5' flanking region of the PAI-1 promoter from 

20 -87 to -49, from -674 to -636 and from -740 to -703. 

Oligonucleotides having portions or all of these regions were 
synthesized and cloned into a pUC-lucif erase expressing plasmid 
containing the minimal promoter as described in Example 1C. 
The resultant plasmids were transiently transfected into 

25 recipient Hep3B cells as described in Example 2C and evaluated 
for their response to TGF-S as measured by luciferase 
expression as described in Example 3A. The plasmid designated 
p56Luc contained an oligonucleotide sequence that corresponded 
to -56 to -41 of the PAI-1 promoter gene (also referred to as 

30 region A) and conferred a 10- fold induction of measurable. TGF-B 
as compared to a 3-fold induction obtained with a plasmid 
expression vector only containing the minimal promoter 
sequence . 

Another plasmid designated p674Luc, deposited with ATCC 
35 and having ATCC Accession Number 75627, contained an 
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oligonucleotide sequence 2 5 bp in length that corresponded to 
-674 to -650 of the PAI-1 promoter (also referred to as region 
B) . This nucleotide sequence conferred a 70-fold induction on 
the minimal promoter. The plasmid designated p743Luc contained 
an oligonucleotide sequence 35 bp in length that corresponded 
to -743 to -7 08 of the PAI-1 promoter (also referred to as 
region C) . This nucleotide sequence conferred a 35-fold 
induction in the promoter. The plasmid designated p732Luc 
exhibited 62-fold induction while the plasmid, p732HBV, having 
the hepatitis B virus (HBV) minimal promoter sequence instead 
of the PAI-1 sequence exhibited 47-fold induction. 

This result is in comparison to 6-fold basal induction 
from a control plasmid having only the HBV minimal promoter 
without having any TGF-S response elements. The nucleotide 
sequence of the sense strand of the HBV-minimal promoter- 
containing plasmid having or lacking the neomycin selectable 
marker gene are listed respectively in SEQ ID NOs 23 and 24. 
In parallel assays, the p800Luc plasmid that contained 3 AP-1- 
like sequences conferred greater than 150-fold induction of 
TGF-6 responsiveness as compared to the minimal promoter 
sequence. The stably transformed plSOOLuc similarly resulted 
in approximately 150-fold induction. These results as well as 
the others presented in the Examples represent the average of 
at least 4 independent experiments, each performed in 
duplicate. 

Regions A and C contained only a single AP-1 like sequence 
whereas region B contained 2 AP-1 like binding sequences. 
Thus, oligonucleotides containing AP-1 like sequences from each 
region were able to confer TGF-jS responsiveness to a non- 
responsive minimal promoter. 

B. Responsiveness of the TGF-fc rgsnon sive Rpainn^ 
A, B and C to c-fos/c-Hun 

In order to directly test the response of the p56Luc, 
p674Luc and p743Luc plasmids to AP-1, they were cotransf ected 
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together into Hep3B cells with plasmids containing the mouse 
genes for c-fos and c-jun under the control of the RSV 
promoter. All three of these regions showed a dose dependent 
response to increasing amounts of c-f os/c-jun, with maximum 
5 responses seen using 0.1 jig /well of c-fos and c-jun plasmids. 

This response was dependent on co-transf ection of both plasmids 
since neither c-fos or c-jun alone was able to cause this 
induction . 

10 C. Detailed A nalysis of the TGF-ft PeftPPHSive 

Nucleotide Seque n t in the PAI-1 Prompter 
from Nucleotide -743 to -708 (Region C) 
To find the minimal TGF-fi responsive sequence in the 
PAI-1 promoter region from nucleotide position -743 to -7 08, 

15 the sequence of which is listed in SEQ ID NO 16, two 

oligonucleotides were made, the first from the 3' side of 
region C which contained the AP-1 like sequence (C2 : -723 to 
-708 corresponding to the sequence in SEQ ID NO 16 from 21 to 
36) and the second from the remaining 5* sequence (C3 : -743 to 

20 -727 corresponding to the sequence in SEQ ID NO 16 from 1 to 
17) . When the oligonucleotides were examined for response to 
TGF-fi, neither the C2 or C3 sequence showed maximal induction 
with TGF-S (10-fold and 3-fold induction, respectively) as 
compared to region C itself (25-fold induction) . This result 

25 suggested that a portion of a TGF-S responsive binding site 

located between -723 and -727 was deleted. The 5' side of C2 
was then progressively extended to include bases between -723 
to -728 '{7 -fold induction) but found that this did not improve 
the TGF-E response. However when this region was extended 

30 another 4 bp there was a dramatic increase in the TGF-B 

response (63 -fold induction) indicating that this region was 
crucial to this response. 

D. fiii-g-fipecific Mut at i ons of the Pftl-jL Promoter 
35 from Nucleotide -732 to -708, Regio n C5 
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To assess the role of the AP-1 site compared to the 
5* TGF-S responsive site, the response of the minimal promoter 
having the 5' flanking region of the PAI-1 promoter from -39 to 
+76 to direct stimulation with c-f os/c-jun was determined. It 
5 showed 10-fold induction with AP-1 compared to only 3-fold 

induction with TGF-S. When C5 was tested in a similar manner 
there was only a 2 -fold increase above the vector background 
induced by c-fos/c-jun compared to a greater than 20-fold 
increase above background seen with TGF-S (C5 itself showed 63- 

10 fold induction) . Thus, although the wild type AP-1 site in C5 
was only a relatively poor responsive sequence to c-fos/c-jun, 
this region still showed a strong response to TGF-S. The AP-1 
site was therefore mutated to produce a consensus AP-1 sequence 
(TGACACA to TGAGTCA, SEQ ID NOs 29 and 30, respectively) and 

15 the response of mutant to both c-fos/c-jun and TGF-S was 

compared. This mutation increased the AP-1 response from 19- 
fold to 105-fold but did not improve the TGF-S response. In 
fact, a consistent decrease was seen in the TGF-S response 
following this mutation (63-fold induction with TGF-S for the 

20 wild type AP-1 like site to 30-fold for the consensus AP-1 
site) . 

The AP-1 like site was then mutated by changing the 
critical TGA bases, a change shown by others to decrease the 
activity of the AP-1 binding site. Although this mutation had 
25 the expected effect of abolishing the AP-1 response, it did not 
completely abolish the response of this construct to TGF-S (10- 
fold induction with c-fos/c-jun [i.e., vector background) but a 
13 -fold induction with TGF-S [i.e., 5-fold above vector 
background] ) . 

30 This result once again suggested that the 5* portion of C5 

(-732 to -708) was more critical than the AP-1 like sequence in 
mediating the TGF-S response. To further test this hypothesis, 
4 bp between -728 and -732 was mutated (the resultant mutated 
vector designated C8) since the previous deletion results 

35 suggested that this sequence was critical to the TGF-S 
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response. A 3 bp sequence between -726 and -728 was also 
mutated (the resultant vector was designated C9) . As expected, 
both of these 5' mutations caused dramatic reductions in the 
response of C5 to TGF-E (60-fold to 4-fold for both C8 and C9> - 
5 These changes had little effect on the AP-1 response which 
decreased only slightly from 19-fold to 13-fold. A double 
mutation of both of these sites was also created and this 
abolished both the TGF-S and the AP-1 activity. 

10 e. HPt- prolog ™ id Prnnpfpr TnduCtXOP 

To test whether the 25 bp oligonucleotide from the 
PAI-1 promoter region C5, -732 to -708 (SEQ ID NO 15), was able 
to activate a heterologous promoter, it was cloned into a 
hepatitis B viral promoter, the latter of which had the 
15 nucleotide sequence from -188 to +145 of the viral promoter . 

(SEQ ID NO 19). Control experiments found that this construct 
alone showed 28-fold induction with fos/jun. However, the 
viral promoter showed only 4-fold induction with TGF-fi. Thus, 
even though the hepatitis B viral promoter had active AP-1 like 
20 sites, these were not sufficient for a strong TGF-S response. 

The region between -708 and -732 of the PAI-1 promoter 
(C5) was then cloned into the viral promoter and the resultant 
construct was tested as above. The 25 bp PAI-1 fragment was 
able to dramatically increase the TGF-S response of the viral 
25 promoter from 4-fold to 47-fold but did not alter the AP-1 

response (25-fold compared to 28-fold) . Finally, mutation of 
bases between -732 and -728 of the PAI-1 promoter 
oligonucleotide dramatically reduced the TGF-S induction of 
this fragment but did not lower the response to AP-1. 

30 

F . h£=l -Tndep gprfpnt- TGF-fi Induction 

To determine if the 5' -732 to -708 nucleotide 
sequence from the PAI-1 promoter could function independently 
of the AP-1 site in the TGF-S response, a 15 bp oligonucleotide 
35 containing bases between -732 and -718, corresponding to the 
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nucleotide sequence from position 1 to 15 in SEQ ID NO 17) 
(which excludes the AP-1 like site) was cloned into a pUC- 
lucif erase expression vector having the minimal PAI-1 promoter. 
This 15 bp sequence was able to confer 20-fold induction with 
TGF-B with the minimal PAI-1 promoter and did not show any AP-1 
activity . 

With regard to the AP-1 like sites involved in this 
response, unlike the consensus sequence for AP-1 (TGASTCA, 
where S is G or C (SEQ ID NO 26), the most active sequences 
from the PAI-1 promoter all have the sequence TGA (N) ACA where N 
is either A, C, G or T (SEQ ID NO 31) (PAI-1 promoter: -717 to 
-711 = TGACACA (SEQ ID NO 29); -659 to -653 = TGATACA (SEQ ID 
NO 32) . It is possible that the T to A substitution may affect 
the binding affinity enough to preferentially bind another 
protein other than c-f os/c-jun . This is consistent with the 
functional data on the AP-1 like site of the PAI-1 promoter 
(between -711 to -717) which indicates that the wild type 
sequence is a poor AP-1 binding site and yet is still important 
in the TGF-g response. 

The mutation and deletion data of the 25 bp sequence from 
the wild type PAI-1 promoter (-732 to -708) suggested that the 
5 1 side of the oligonucleotide may contain a second binding 
site of importance in the TGF-fi response. In fact this region 
appeared to be more critical than the AP-1 sequence since 
mutation of this region almost completely abolished the TGF-S 
response even though the AP-1 region was intact. When this 
sequence alone was evaluated, it was able to act independently 
of the AP-1 site and promote strong TGF-fi induction of the 
normally unresponsive minimal promoter. However, the full TGF- 
£ response was dependent on the functional activity of both the 
AP-1 like site and the 5 1 site. When the sequence of the 5' 15 
bp sequence was compared to the other region of the PAI-1 
promoter which also showed strong TGF-fi induction (region B = 
60- fold) , a sequence was found that was common to both of these 
regions (CCNTGTNT, where N is either A, C, G or T (SEQ ID NO 
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in summary, the TGF-B response of the PAI-1 promoter has 
been localized to specific AP-1 like sites. However, the full 
TGF-E response of this region of the PAI-1 promoter is 
dependent on the interaction of two binding sites. The first 
site has homology to an AP-1 site but does not appear to bind 
AP-1. While this site is not essential it is required for the 
full TGF-E induction from this region. The second site, 
located 5' to the AP-1 site, appears to be critical in the TGF- 
£ response. This site is 15 bp in size and contains a motif 
that is present in both active regions of the PAI-1 promoter as 
well as in the most responsive region of the TGF-E promoter. 
This novel sequence does not appear to match any previously 
described transcription factor binding sites and may represent 
a new and specific binding site which is critical for a strong 
TGF-S response. 

5 . p rosit pj l*flt; trials 

The plasmids, p674Luc, p743Luc and p732Luc, were deposited 
on or before December 16, 1993, with the American Type Culture 
Collection, 1301 Parklawn Drive, Rockville, MD, USA (ATCC) and 
assigned the respective ATCC Accession Numbers ATCC 75627, ATCC 
75628 and ATCC 75629. The cell line, Hep3B, stably transf ected 
with plasmid pl500Luc for a transformed cell line designated 
LUCI, was also deposited on or before December 16, 1993 with 
ATCC and assigned the ATCC Accession Number CRL 11508. The 
deposit thus provides plasmids and a stably transfected cell 
line containing plasmid plSOOLuc. These deposits were made 
under the provisions of the Budapest Treaty on the 
international Recognition of the Deposit of Microorganisms for 
the Purpose of Patent Procedure and the Regulations thereunder 
(Budapest Treaty) . This assures maintenance of viable plasmids 
and cell lines for 30 years from the date of deposit. The 
plasmids and cell line will be made available by ATCC under the 
terms of the Budapest Treaty which assures permanent and 
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unrestricted availability of the progeny of the culture to the 
public upon issuance of the pertinent U.S. patent or upon 
laying open to the public of any U.S. or foreign patent 
application, whichever comes first, and assures availability of 
5 the progeny to one determined by the U.S. Commissioner of 

Patents and Trademarks to be entitled thereto according to 35 
U.S.C. §122 and the Commissioner's rules pursuant thereto 
(including 37 CFR §1.14 with particular reference to 886 OG 
638) . The assignee of the present application has agreed that 

10 if the plasmid or cell line deposits should die or be lost or 
destroyed when cultivated under suitable conditions, they will 
be promptly replaced on notification with a viable specimen of 
the same plasmid or cell culture. Availability of the 
deposited plasmids is not to be construed as a license to 

15 practice the invention in contravention of the rights granted 
under the authority of any government in accordance with its 
patent laws . 

The foregoing written specification is considered to be 
20 sufficient to enable one skilled in the art to practice the 

invention. The present invention is not to be limited in scope 
by the plasmids deposited, since the deposited embodiment is 
intended as a single illustration of one aspect of the 
invention and any plasmids that are functionally equivalent are 
25 within the scope of this invention. The deposit of material 
does not constitute an admission that the written description 
herein contained is inadequate to enable the practice of any 
aspect of the invention, including the best mode thereof, nor 
is it to be construed as limiting the scope of the claims to 
30 the specific illustration that it represents. Indeed, various 
modifications of the invention in addition to those shown and 
described herein will become apparent to those skilled in the 
art from the foregoing description and fall within the scope of 
the appended claims. 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: 

(A) NAME: The Scripps Research Institute 

(B) STREET: 10666 North Torrey Pines Road 

(C) CITY: La Jolla 

(D) STATE: CA 

(E) COUNTRY: USA 

(F) POSTAL CODE (ZIP): 92037 

(G) TELEPHONE: 619-554-2937 

(H) TELEFAX: 619-554-6312 

(ii) TITLE OF INVENTION: A NEW SENSITIVE METHOD FOR QUANTIFYING 
ACTIVE TRANSFORMING GROWTH FACTOR-BETA AND COMPOSITIONS 
THEREFOR 



(iii) NUMBER OF SEQUENCES: 33 

(iv) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS -DOS 

(D) SOFTWARE: Patentln Release #1.0, Version #1.25 (EPO) 

(v) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT/US 95/ 

(B) FILING DATE: 25-JAN-1995 

(vi) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMB ERE: US 08/188,227 

(B) FILING DATE: 25-JAN-1994 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 11293 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600. 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATG GGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AG ATA CC AAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 
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TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 
TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 
GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 
ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 
GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 
GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 
CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 
GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 
TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 
CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 
TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 
ATAGTTAAGC CAGTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 
TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAG GTAAAT ATAAAATTTT 
TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT 
ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT 
CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC 
CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT 
TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 
AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 
GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT 
CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG 
TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 
TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 
AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA 
AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT 
TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 



1500 

1560 

1620 

1680 

1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 

2700 

2760 

2820 

2880 

2940 

3000 



WO 95/19987 



PCT/US95/01153 



-92-" 



TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA 


GTATGCAAAG 


CATGCATCTC 


3060 


AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC 


CAGCAGGCAG 


AAGTATGCAA 


3120 


AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC 


TAACTCCGCC 


CATCCCGCCC 


3180 


CTAACTCCGC CCAGTTCCGC CCATTCTCCG CCCCATGGCT 


GACTAATTTT 


TTTTATTTAT 


3240 


GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA 


AGTAGTGAGG 


AGGCTTTTTT 


3300 


GGAGGCCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA 


GCACTCAGGG 


CGCAAGGGCT 


3360 


GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA 


ACGGTGCTGA 


CCCCGGATGA 


3420 


ATGTCAGCTA CTGGGCTATC TGGACAAGGG AAAACGCAAG 


CGCAAAGAGA 


AAGCAGGTAG 


3480 


CTTGCAGTGG GCTTACATGG CG AT AG CTAG ACTGGGCGGT 


TTTATGGACA 


GCAAGCGAAC 


3540 


CGGAATTGCC AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA 


GCCCTGCAAA 


GTAAACTGGA 


3600 


TGGCTTTCTT GCCGCCAAGG ATCTGATGGC GCAGGGGATC 


AAGATCTGAT 


CAAGAGACAG 


3660 


GATGAGGATC GTTTCGCATG ATTGAACAAG ATGGATTGCA 


CGCAGGTTCT 


CCGGCCGCTT 


3720 


GGGTGGAGAG GCTATTCGGC TATGACTGGG CACAACAGAC 


AATCGGCTGC 


TCTGATGCCG 


3780 


CCGTGTTCCG GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT 


TGTCAAGACC 


GACCTGTCCG 


3840 


GTGCCCTGAA TGAACTGCAG GACGAGGCAG CGCGGCTATC 


GTGGCTGGCC 


ACGACGGGCG 


3900 


TTCCTTGCGC AGCTGTGCTC GACGTTGTCA CTGAAGCGGG 


AAGGGACTGG 


CTGCTATTGG 


3960 


GCGAAGTGCC GGGGCAGGAT CTCCTGTCAT CTCACCTTGC 


TCCTGCCGAG 


AAAGTATCCA 


4020 


TCATGGCTGA TGCAATGCGG CGGCTGCATA CGCTTGATCC 


GGCTACCTGC 


CCATTCGACC 


4080 


ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT 


GGAAGCCGGT 


CTTGTCGATC 


4140 


AGGATGATCT GGACGAAGAG CATCAGGGGC TCGCGCCAGC 


CGAACTGTTC 


GCCAGGCTCA 


4200 


AGGCGCGCAT GCCCGACGGC GAGGATCTCG TCGTGACCCA 


TGGCGATGCC 


TGCTTGCCGA 


4260 


ATATCATGGT GGAAAATGGC CGCTTTTCTG GATTCATCGA 


CTGTGGCCGG 


CTGGGTGTGG 


4320 


CGGACCGCTA TCAGGACATA GCGTTGGCTA CCCGTGATAT 


TGCTGAAGAG 


CTTGGCGGCG 


4380 


AATGGGCTGA CCGCTTCCTC GTGCTTTACG GTATCGCCGC 


TCCCGATTCG 


CAGCGQATCG 


4440 


CCTTCTATCG CCTTCTTGAC GAGTTCTTCT GAGCGGGACT 


CTGGGGTTCG 


AAATGACCGA 


4500 


CCAAGCGACG CCCAACCTGC CATCACGAGA TTTCGATTCC 


ACCGCCGCCT 


TCTATGAAAG 


4560 
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GTTGGGCTTC 
CATGCTGGAG 
CCTGAGGCTG 
AACCAGCAGC 
GGCCGCTTTG 
ACAAACTACC 
GTGTTAAACT 
AATGGGAGCA 
CATCTAGTGA 
GAAAGGTAGA 
TGTTTAGTAA 
TGCTATACAA 
ATAATCATAA 
ACTATGCTCA 
ATTTGATGTA 
TTACTTGCTT 
ATTGTTGTTG 
ACAAATTTCA 
ATCAATGTAT 
AACCTCCTCT 
CCTGTTAATT 
TCTAAGGGTA 
GGTAAACAGC 
CCCAACACCC 
AGGCACATTT 
ATCAGGAACC 



GGAATCGTTT 
TTCTTCGCCC 
GACGACCTCG 
GGCTATCCGC 
GTCCCGGATC 
TACAGAGATT 
ACTGATTCTA 
GTGGTGGAAT 
TGATGAGGCT 
AGACCCCAAG 
TAGAACTCTT 
GAAAATTATG 
CATACTGTTT 
AAAATTGTGT 
TAGTGCCTTG 
TAAAAAACCT 
TTAACTTGTT 
CAAATAAAGC 
CTTATCATGT 
ACTTGAGAGG 
AGGTCACTTA 
ATTTTAAAAT 
CCACAAATGT 
TGCTCATCAA 
TCCCCACCTG 
CAGCACTCCA 



TCCGGGACGC 
ACCCCGGGCT 
CGGAGTTCTA 
GCATCCATGC 
TTTGTGAAGG 
TAAAGCTCTA 
ATTGTTTGTG 
GCCTTTAATG 
ACTGCTGACT 
GACTTTCCTT 
GCTTGCTTTG 
GAAAAATATT 
TTTCTTACTC 
ACCTTTAGCT 
ACTAGAGATC 
CCCACACCTC 
TATTGCAGCT 
ATTTTTTTCA 
CTGGATCCCC 
ACATTCCAAT 
ACAAAAAGGA 
ATCTGGGAAG 
CAACAGCAGA 
GAAGCACTGT 
TGTAGGTTCC 
CTGGATAAGC 



CGGCTGGATG ATCCTCCAGC GCGGGGATCT 4620 

CGATCCCCTC GCGAGTTGGT TCAGCTGCTG 4680 

CCGGCAGTGC AAATCCGTCG GCATCCAGGA 4740 

CCCCGAACTG CAGGAGTGGG GAGGCACGAT 4800 

AACCTTACTT CTGTGGTGTG ACATAATTGG 4860 

AGGTAAATAT AAAATTTTTA AGTGTATAAT 4920 

TATTTTAGAT TCCAACCTAT GGAACTGATG 4980 

AGGAAAACCT GTTTTGCTCA GAAGAAATGC 5040 

CTCAACATTC TACTCCTCCA AAAAAGAAGA 5100 

CAGAATTGCT AAGTTTTTTG AGTCATGCTG 5160 

CTATTTACAC CACAAAGGAA AAAGCTGCAC 5220 

CTGTAACCTT TATAAGTAGG CATAACAGTT 5280 ' 

CACACAGGCA TAGAGTGTCT GCTATTAATA 5340 

TTTTAATTTG TAAAGGGGTT AATAAGGAAT 5400 

ATAATCAGCC ATACCACATT TGTAGAGGTT 5460 

CCCCTGAACC TGAAACATAA AATGAATGCA 5520 

TATAATGGTT ACAAATAAAG CAATAGCATC 5580 

CTGCATTCTA GTTGTGGTTT GTCCAAACTC 5640 

AGGAAGCTCC TCTGTGTCCT CATAAACCCT 5700 

CATAGGCTGC CCATCCACCC TCTGTGTCCT 5760 

AATTGGGTAG GGGTTTTTCA CAGACCGCTT 5820 

TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT 5880 

AACATACAAG CTGTCAGCTT TGCACAAGGG 5940 

GGTTGCTGTG TTAGTAATGT GCAAAACAGG 6000 

AAAATATCTA GTGTTTTCAT TTTTACTTGG 6060 

ATTATCCTTA TCCAAAACAG CCTTGTGGTC 6120 



WO 95/19987 

AGTGTTCATC TGCTGACTGT CAACTGTAGC 
TTTGGTCCTG TAGTTTGCTA ACACACCCTG 
AAAATGAAAA TTTGACCCTT GAATGGGTTT 
CCCTGAATGC AAGTTTAACA TAGCAGTTAC 
TTCCCACATC AAAATATTTC CACAGGTTAA 
ACACTCCGCT ATCGCTACGT GACTGGGTCA 
CTGACGCGCC CTGACGGGCT TGTCTGCTCC 
TCTCCGGGAG CTGCATGTGT CAGAGGTTTT 
GGATCATAAT CAGCCATACC ACATTTGTAG 
ACCTCCCCCT GAACCTGAAA CATAAAATGA 
CAGCTTATAA TGGTTACAAA TAAAGCAATA 
TTTCACTGCA TTCTAGTTGT GGTTTGTCCA 
TCATAATCAG CCATACCACA TTTGTAGAGG 
TCCCCCTGAA CCTGAAACAT AAAATGAATG 
CTTATAATGG TTACAAATAA AGCAATAGCA 
CACTGCATTC TAGTTGTGGT TTGTCCAAAC 
CAAGCTTACC ATGGTAACCC CTGGTCCCGT 
CCAACCTCAG CCAGACAAGG TTGTTGACAC 
TGGACACGTG GGGAGTCAGC CGTGTATCAT 
GGGAAAGACC AAGAGTCCTC TGTTGGGCCC 
ACGTGGCTGG CTGCATGCCT GTGGCTGTTG 
TCCTGGAGGT GGTCCAGAGC ACCGGGTGGA 
TGGAGGTTAT CTTTGATAAC TCCACAGTGA 
TGAGCTGTTT TTTTTTTCTC CAAGCTGAAC 
GGCATGGCAG ACAGTCAACC TGGCAGG A CA 
GAAAGGTCAA GGGAGGTTCT CAGGCCAAGG 
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ATTTTTTGGG GTTACAGTTT GAGCAGGATA 6180 

CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 6240 

TCCAGCACCA TTTTCATGAG TTTTTTGTGT 6300 

CCCAATAACC TCAGTTTTAA CAGTAACAGC 6360 

GTCCTCATTt AAATTAGGCA AAGGAATTAT 6420 

TGGCTGCGCC CCGACACCCG CCAACACCCG 6480 

CGGCATCCGC TTACAGACAA GCTGTGACCG 6540 

CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600 

AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660 

ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720 

GCATCACAAA TTTCACAAAT AAAGCATTTT 6780 

AACTCATCAA TGTATCTTAT CATGTCTGGA 6840 

TTTTACTTGC TTTAAAAAAC CTCCCACACC 6900 

CAATTGTTGT TGTTAACTTG TTTATTGCAG 6960 

TCACAAATTT CACAAATAAA GCATTTTTTT 7020 

TCATCAATGT ATCTTATCAT GTCTGGATCC 7080 

TCAGCCACCA CCACCCCACC CAGCACACCT 7140 

AAGAGAGCCC TCAGGGGCAC AGAGAGAGTC 7200 

CGGAGGCGGC CGGGCACATG GCAGGGATGA 7260 

AAGTCCTAGA CAGACAAAAC CTAGACAATC 7320 

GGCTGGGCAG GAGGAGGGAG GGGCGCTCTT 7380 

CAGCCCTGGG GGAAAACTTC CACGTTTTGA 7440 

CCTGGTTCGC CAAAGGAAAA GCAGGCAACG 7500 

ACTAGGGGTC CTAGGCTTTT TGGGTCACCC 7560 

TCCGGGAGAG ACAGACACAG GCAGAGGGCA 7620 

CTATTGGGGT TTGCTCAATT GTTCCTGAAT 7680 
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GCTCTTACAC ACGTACACAC AC AG AG C AG C ACACACACAC ACACACACAT GCCTCAGCAA 
GTCCCAGAGA GGGAGGTGTC GAGGGGGACC CGCTGGCTGT TCAGACGGAC TCCCAGAGCC 
AGTGAGTGGG TGGGGCTGGA ACATGAGTTC ATCTATTTCC TGCCCACATC TGGTATAAAA 
GGAGGCAGTG GCCCACAGAG GAGCACAGCT GTGTTTGGCT GCAGGGCCAA GAGCGCTGTC 
AAGAAGACCC ACACGCCCCC CTCCAGCAGC TGAATTCCAG CTGGCATTCC GGTACTGTTG 
GTAAAATGGA AGACGCCAAA AACATAAAGA AAGGCCCGGC GCCATTCTAT CCTCTAGAGG 
ATGGAACCGC TGGAGAGCAA CTGCATAAGG CTATGAAGAG ATACGCCCTG GTTCCTGGAA 
CAATTGCTTT TACAGATGCA CATATCGAGG TGAACATCAC GTACGCGGAA TACTTCGAAA 
TGTCCGTTCG GTTGGCAGAA GCTATGAAAC GATATGGGCT GAATACAAAT CACAGAATCG 
TCGTATGCAG TGAAAACTCT CTTCAATTCT TTATGCCGGT GTTGGGCGCG TTATTTATCG 
GAGTTGCAGT TGCGCCCGCG AACGACATTT ATAATGAACG TGAATTGCTC AACAGTATGA 
ACATTTCGCA GCCTACCGTA GTGTTTGTTT CCAAAAAGGG GTTGCAAAAA ATTTTGAACG 
TGCAAAAAAA ATTACCAATA ATCCAGAAAA TTATTATCAT GGATTCTAAA ACGGATTACC 
AGGGATTTCA GTCGATGTAC ACGTTCGTCA CATCTCATCT ACCTCCCGGT TTTAATGAAT 
ACGATTTTGT ACCAGAGTCC TTTGATCGTG ACAAAACAAT TGCACTGATA ATGAATTCCT 
CTGGATCTAC TGGGTTACCT AAGGGTGTGG CCCTTCCGCA TAGAACTGCC TGCGTCAGAT 
TCTCGCATGC CAGAGATCCT ATTTTTGGCA ATCAAATCAT TCCGGATACT GCGATTTTAA 
GTGTTGTTCC ATTCCATCAC GGTTTTGGAA TGTTTACTAC ACTCGGATAT TTGATATGTG 
GATTTCGAGT CGTCTTAATG TATAGATTTG AAGAAGAGCT GTTTTTACGA TCCCTTCAGG 
ATTACAAAAT TCAAAGTGCG TTGCTAGTAC CAACCCTATT TTCATTCTTC GCCAAAAGCA 
CTCTGATTGA CAAATACGAT TTATCTAATT TACACGAAAT TGCTTCTGGG GGCGCACCTC 
TTTCGAAAGA AGTCGGGGAA GCGGTTGCAA AACGCTTCCA TCTTCCAGGG ATACGACAAG 
GATATGGGCT CACTGAGACT ACATCAGCTA TTCTGATTAC ACCCGAGGGG GATGATAAAC 
CGGGCGCGGT CGGTAAAGTT GTTCCATTTT TTGAAGCGAA GGTTGTGGAT CTGGATACCG 
GGAAAACGCT GGGCGTTAAT CAGAGAGGCG AATTATGTGT CAGAGGACCT ATGATTATGT 
CCGGTTATGT AAACAATCCG GAAGCGACCA ACGCCTTGAT TGACAAGGAT GGATGGCTAC 
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ATTCTGGAGA CATAGCTTAC TGGGACGAAG ACGAACACTT CTTCATAGTT GACCGCTTGA 9300 

AGTCTTTAAT TAAATACAAA GGATATCAGG TGGCCCCCGC TGAATTGGAA TCGATATTGT 9360 

TACAACACCC CAACATCTTC GACGCGGGCG TGGCAGGTCT TCCCGACGAT GACGCCGGTG 9420 

AACTTCCCGC CGCCGTTGTT GTTTTGGAGC ACGGAAAGAC GATGACGGAA AAAGAGATCG 9480 

TGGATTACGT CGCCAGTCAA GTAACAACCG CGAAAAAGTT GCGCGGAGGA GTTGTGTTTG 9540 

TGGACGAAGT ACCGAAAGGT CTTACCGGAA AACTCGACGC AAGAAAAATC AGAGAGATCC 9600 

TCATAAAGGC CAAGAAGGGC GGAAAGTCCA AATTGTAAAA TGTAACTGTA TTCAGCGATG 9660 

ACGAAATTCT TAGCTATTGT AATGACTCTA GAGGATCTTT GTGAAGGAAC CTTACTTCTG 9720 

TGGTGTGACA TAATTGGACA AACTACCTAC AGAGATTTAA AGCTCTAAGG TAAATATAAA 9780 

ATTTTTAAGT GTATAATGTG TTAAACTACT GATTCTAATT GTTTGTGTAT TTTAGATTCC 9840 

AACCTATGGA ACTGATGAAT GGGAGCAGTG GTGGAATGCC TTTAATGAGG AAAACCTGTT 9900 

TTGCTCAGAA GAAATGCCAT CTAGTGATGA TGAGGCTACT GCTGACTCTC AACATTCTAC 9960 

TCCTCCAAAA AAGAAGAGAA AGGTAGAAGA CCCCAAGGAC TTTCCTTCAG AATTGCTAAG 10020 

TTTTTTGAGT CATGCTGTGT TTAGTAATAG AACTCTTGCT TGCTTTGCTA TTTACACCAC 10080 

AAAGGAAAAA GCTGCACTGC TATACAAGAA AATTATGGAA AAATATTCTG TAACCTTTAT 10140 

AAGTAGGCAT AACAGTTATA ATCATAACAT ACTGTTTTTT CTTACTCCAC ACAGGCATAG 10200 

AGTGTCTGCT ATTAATAACT ATGCTCAAAA ATTGTGTACC TTTAG CTTTT TAATTTGTAA 10260 

AGGGGTTAAT AAGGAATATT TGATGTATAG TGCCTTGACT AGAGATCATA ATCAGCCATA 10320 

CCACATTTGT AGAGGTTTTA CTTGCTTTAA AAAACCTCCC ACACCTCCCC CTGAACCTGA 10380 

AACATAAAAT GAATGCAATT GTTGTTGTTA ACTTGTTTAT TGCAGCTTAT AATGGTTACA 10440 

AATAAAGCAA TAGCATCACA AATTTCACAA ATAAAGCATT TTTTTCACTG CATTCTAGTT 10500 

GTGGTTTGTC CAAACTCATC AATGTATCTT ATCATGTCTG GATCCCCAGG AAGCTCCTCT 10560 

GTGTCCTCAT AAACCCTAAC CTCCTCTACT TGAGAGGACA TTCCAATCAT AGGCTGCCCA 10620 

TCCACCCTCT GTGTCCTCCT GTTAATTAGG TCACTTAACA AAAAGGAAAT TGGGTAGGGG 10680 

TTTTTCACAG ACCGCTTTCT AAGGGTAATT TTAAAATATC TGGGAAGTCC CTTCCACTGC 10740 

TGTGTTCCAG AAGTGTTGGT AAACAGCCCA CAAATGTCAA CAGCAGAAAC ATACAAGCTG 10800 
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TCAGCTTTGC ACAAGGGCCC AACACCCTGC TCAGCAAGAA GCACTGTGGT TGCTGTGTTA 10860 

GTAATGTGCA AAACAGGAGG CACATTTTCC CCACCTGTGT AGGTTCCAAA ATATCTAGTG 10920 

TTTTCATTTT TACTTGGATC AGGAACCCAG CACTCCACTG GATAAGCATT ATCCTTATCC 10980 

AAAACAGCCT TGTGGTCAGT GTTCATCTGC TGACTGTCAA CTGTAGCATT TTTTGGGGTT 11040 

ACAGTTTGAG CAGGATATTT GGTCCTGTAG TTTGCTAACA CACCCTGCAG CTCCAAAGGT 11100 

TCCCCACCAA CAGCAAAAAA ATGAAAATTT GACCCTTGAA TGGGTTTTCC AGCACCATTT 11160 

TCATGAGTTT TTTGTGTCCC TGAATGCAAG TTTAACATAG CAGTTACCCC AATAACCTCA 11220 

GTTTTAACAG TAACAGCTTC CCACATCAAA ATATTTCCAC AGGTTAAGTC CTCATTTAAA 11280 

11293 

TTAGGCAAAG GAA 

(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10697 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 
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CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCT AGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AG ATA CC AAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAG GGG GAAACGCCTG 1740 

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 
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TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCGGC 
ATAGTTAAGC CAGTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 
TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT 
TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT 
ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT 
CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC 
CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT 
TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 
AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 
GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT 
CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG 
TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 
TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 
AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA 
AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT 
TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 
TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 
AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG AAGTATGCAA 
AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC 
CTAACTCCGC CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 
GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGGCTTTTTT 
GGAGGCCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG CGCAAGGGCT 
GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA CCCCGGATGA 
ATGTCAGCTA CTGGGCTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA AAGCAGGTAG 
CTTGCAGTGG GCTTACATGG CGATAGCTAG ACTGGGCGGT TTTATGGACA GCAAGCGAAC 
CGGAATTGCC ACCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA GTAAACTGGA 
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TGGCTTTCTT GCCGCCAAGG ATCTGATGGC GCAGGGGATC AAGATCTGAT CAAGAGAGAG 3660 

GATGAGGATC GTTTCGCATG ATTGAACAAG ATGGATTGCA CGCAGGTTCT CCGGCCGGTT 3720 

GGGTGGAGAG GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC TCTGATGCCG 3780 

CCGTGTTCCG GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GACCTGTCCG 3840 

GTGCCCTGAA TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC ACGACGGGCG 3900 

TT.CCTTGCGC AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG CTGCTATTGG 3960 

GCGAAGTGCC GGGGCAGGAT CTCCTGTCAT CTCACCTTGC TCCTGCCGAG AAAGTATCCA 4020 

TCATGGCTGA TGCAATG CGG CGGCTGCATA CGCTTGATCC GGCTACCTGC CCATTCGACC 4080 

ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT CTTGTCGATC 4140 

AGGATGATCT GG AC GAAGAG CATCAGGGGC TCGCGCCAGC CGAACTGTTC GCCAGGCTCA 4200 

AGGCGCGCAT GCCCGACGGC GAGGATCTCG TCGTGACCCA TGGCGATGCC TGCTTGCCGA 4260 

ATATCATGGT GGAAAATGGC CGCTTTTCTG GATTCATCGA CTGTGGCCGG CTGGGTGTGG 4320 

CGGACCGCTA TCAGGACATA GCGTTGGCTA CCCGTGATAT TGCTGAAGAG CTTGGCGGCG 4380 

AATGGGCTGA CCGCTTCCTC GTGCTTTACG GTATCGCCGC TCCCGATTCG CAGCGCATCG 4440 

CCTTCTATCG CCTTCTTGAC GAGTTCTTCT GAGCGGGACT CTGGGGTTCG AAATGACCGA 4500 

CCAAGCGACG CCCAACCTGC CATCACGAGA TTTCGATTCC ACCGCCGCCT TCTATGAAAG 4560 

GTTGGGCTTC GGAATCGTTT TCCGGGACGC CGGCTGGATG ATCCTCCAGC GCGGGGATCT 4620 

CATGCTGGAG TTCTTCGCCC ACCCCGGGCT CGATCCCCTC GCGAGTTGGT TCAGCTGCTG 4680 

CCTGAGGCTG GACGACCTCG CGGAGTTCTA CCGGCAGTGC AAATCCGTCG GCATCCAGGA 4740 

AACCAGCAGC GGCTATCCGC GCATCCATGC CCCCGAACTG CAGGAGTGGG GAGGCACGAT 4800 

GGCCGCTTTG GTCCCGGATC TTTGTGAAGG AACCTTACTT CTGTGGTGTG ACATAATTGG 4860 

ACAAACTACC TACAGAGATT TAAAGCTCTA AGGTAAATAT AAAATTTTTA AGTGTATAAT 4920 

GTGTTAAACT ACTGATTCTA ATTGTTTGTG TATTTTAGAT TCCAACCTAT GGAACTGATG 4980 

AATGGGAGCA GTGGTGGAAT GCCTTTAATG AGGAAAACCT GTTTTGCTCA GAAGAAATGC 5040 

CATCTAGTGA TGATGAGGCT ACTGCTGACT CTCAACATTC TACTCCTCCA AAAAAGAAGA 5100 

GAAAGGTAGA AGACCCCAAG GACTTTCCTT CAGAATTGCT AAGTTTTTTG AGTCATGCTG 5160 
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TGTTTAGTAA TAGAACTCTT GCTTGCTTTG CTATTTACAC CACAAAGGAA AAAGCTGCAC 5220 

TGCTATACAA GAAAATTATG GAAAAATATT CTGTAACCTT TATAAGTAGG CATAACAGTT 5280 

ATAATCATAA CATACTGTTT TTTCTTACTC CACACAGGCA TAGAGTGTCT GCTATTAATA 5340 

ACTATGCTCA AAAATTGTGT ACCTTTAGCT TTTTAATTTG TAAAGGGGTT AATAAGGAAT 5400 

ATTTGATGTA TAGTGCCTTG ACTAGAGATC ATAATCAGCC ATACCACATT TGTAGAGGTT 5460 

TTACTTGCTT TAAAAAACCT CCCACACCTC CCCCTGAACC TGAAACATAA AATGAATGCA 5520 

ATTGTTGTTG TTAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG CAATAGCATC 5580 

ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC 5640 

ATCAATGTAT CTTATCATGT CTGGATCCCC AGGAAGCTCC TCTGTGTCCT CATAAACCCT 5700 

AACCTCCTCT ACTTGAGAGG ACATTCCAAT CATAGGCTGC CCATCCACCC TCTGTGTCCT 5760 

CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG GGGTTTTTCA CAGACCGCTT 5820 

TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT 5880 

GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG CTGTCAGCTT TGCACAAGGG 5940 

CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG TTAGTAATGT GCAAAACAGG 6000 

AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA GTGTTTTCAT TTTTACTTGG 6060 

ATCAGGAACC CAGCACTCCA CTGGATAAGC ATTATCCTTA TCCAAAACAG CCTTGTGGTC 6120 

AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG GTTACAGTTT GAGCAGGATA 6180 

TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 6240 

AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT 6300 

CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA CAGTAACAGC 6360 

TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420 

ACACTCCGCT ATCGCTACGT GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG 6480 

CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540 

TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600 

GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660 

ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720 
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CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTtT 6780 

TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 6840 

TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 6900 

TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 6960 

CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 7020 

CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 7080 

CAAGCTTACC ATGGTAACCC CTGGTCCCGT TCAGCCACCA CCACCCCACC CAGCACACCT 7140 

CCAACCTCAG CCAGACAAGG TTGTTGACAC AAGAGAGCCC TCAGGGGCAC AGAGAGAGTC 7200 

TGGACACGTG GGGAGTCAGC CGTGTATCAT CGGAGGCGGC CGGGCACCCA CATCTGGTAT 7260 

AAAAGGAGGC AGTGGCCCAC AGAGGAGCAC AGCTGTGTTT GGCTGCAGGG CCAAGAGCGC 7320 

TGTCAAGAAG ACCCACACGC CCCCCTCCAG CAGCTGAATT CCAGCTGGCA TTCCGGTACT 7380 

GTTGGTAAAA TGGAAGACGC CAAAAACATA AAGAAAGGCC CGGCGCCATT CTATCCTCTA 7440 

GAGGATGGAA CCGCTGGAGA GCAACTGCAT AAGGCTATGA AGAGATACGC CCTGGTTCCT 7500 

GGAACAATTG CTTTTACAGA TGCACATATC GAGGTGAACA TCACGTACGC GGAATACTTC 7560 

GAAATGTCCG TTCGGTTGGC AGAAGCTATG AAACGATATG GGCTGAATAC AAATCACAGA 7620 

ATCGTCGTAT GCAGTGAAAA CTCTCTTCAA TTCTTTATGC CGGTGTTGGG CGCGTTATTT 7680 

ATCGGAGTTG CAGTTGCGCC CGCGAACGAC ATTTATAATG AACGTGAATT GCTCAACAGT 7740 

ATGAACATTT CGCAGCCTAC CGTAGTGTTT GTTTCCAAAA AGGGGTTGCA AAAAATTTTG 7800 

AACGTGCAAA AAAAATTACC AATAATCCAG AAAATTATTA TCATGGATTC TAAAACGGAT 7860 

TACCAGGGAT TTCAGTCGAT GTACACGTTC GTCACATCTC ATCTACCTCC CGGTTTTAAT 7920 

GAATACGATT TTGTACCAGA GTCCTTTGAT CGTGACAAAA CAATTGCACT GATAATGAAT 7980 

TCCTCTGGAT CTACTGGGTT ACCTAAGGGT GTGGCCCTTC CGCATAGAAC TGCCTGCGTC 8040 

AGATTCTCGC ATGCCAGAGA TCCTATTTTT GGCAATCAAA TCATTCCGGA TACTGCGATT 8100 

TTAAGTGTTG TTCCATTCCA TCACGGTTTT GGAATGTTTA CTACACTCGG ATATTTGATA 8160 

TGTGGATTTC GAGTCGTCTT AATGTATAGA TTTGAAGAAG AGCTGTTTTT ACGATCCCTT 8220 

CAGGATTACA AAATTCAAAG TGCGTTGCTA GTACCAACCC TATTTTCATT CTTCGCCAAA 8280 
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AGCACTCTGA TTGACAAATA CGATTTATCT AATTTACACG AAATTGCTTC TGGGGGCGCA 8340 

CCTCTTTCGA AAGAAGTCGG GGAAGCGGTT GCAAAACGCT TCCATCTTCC AGGGATACGA 8400 

CAAGGATATG GGCTCACTGA GACTACATCA GCTATTCTGA TTACACCCGA GGGGGATGAT 8460 

AAACCGGGCG CGGTCGGTAA AGTTGTTCCA TTTTTTGAAG CGAAGGTTGT GGATCTGGAT 8520 

ACCGGGAAAA CGCTGGGCGT TAATCAGAGA GGCGAATTAT GTGTCAGAGG ACCTATGATT 8580 

ATGTCCGGTT ATGTAAACAA TCCGGAAGCG ACCAACGCCT TGATTGACAA GGATGGATGG 8640 

CTACATTCTG GAGACATAGC TTACTGGGAC GAAGACGAAC ACTTCTTCAT AGTTGACCGC 8700 

TTGAAGTCTT TAATTAAATA CAAAGGATAT CAGGTGGCCC CCGCTGAATT GGAATCGATA 8760 

TTGTTACAAC ACCCCAACAT CTTCGACGCG GGCGTGGCAG GTCTTCCCGA CGATGACGCC 8820 

GGTGAACTTC CCGCCGCCGT TGTTGTTTTG GAG C ACGGAA AGACGATGAC GGAAAAAGAG 8880 

ATCGTGGATT ACGTCGCCAG TCAAGTAACA ACCGCGAAAA AGTTGCGCGG AGGAGTTGTG 8940 

TTTGTGGACG AAGTACCGAA AGGTCTTACC GGAAAACTCG ACGCAAGAAA AATCAGAGAG 9000 

ATCCTCATAA AGGCCAAGAA GGGCGGAAAG TCCAAATTGT AAAATGTAAC TGTATTCAGC 9060 

GATGACGAAA TTCTTAGCTA TTGTAATGAC TCTAGAGGAT CTTTGTGAAG GAACCTTACT 9120 

TCTGTGGTGT GACATAATTG GACAAACTAC CTACAGAGAT TTAAAGCTCT AAGGTAAATA 9180 

TAAAATTTTT AAGTGTATAA TGTGTTAAAC TACTGATTCT AATTGTTTGT GTATTTTAGA 9240 

TTCCAACCTA TGGAACTGAT GAATGGGAGC AGTGGTGGAA TGCCTTTAAT GAGGAAAACC 9300 

TGTTTTGCTC AGAAGAAATG CCATCTAGTG ATGATGAGGC TACTGCTGAC TCTCAACATT 9360 

CTACTCCTCC AAAAAAGAAG AGAAAGGTAG AAGACCCCAA GGACTTTCCT TCAGAATTGC 9420 

TAAGTTTTTT GAGTCATGCT GTGTTTAGTA ATAGAACTCT TGCTTGCTTT GCTATTTACA 9480 

CCACAAAGGA AAAAGCTGCA CTGCTATACA AGAAAATTAT GGAAAAATAT TCTGTAACCT 9540 

TTATAAGTAG GCATAACAGT TATAATCATA ACATACTGTT TTTTCTTACT CCACACAGGC 9600 

ATAGAGTGTC TGCTATTAAT AACTATGCTC AAAAATTGTG TACCTTTAGC TTTTTAATTT 9660 

GTAAAGGGGT TAATAAGGAA TATTTGATGT ATAGTGCCTT GACTAGAGAT CATAATCAGC 9720 

CATACCACAT TTGTAGAGGT TTTACTTGCT TTAAAAAACC TCCCACACCT CCCCCTGAAC 9780 

CTGAAACATA AAATGAATGC AATTGTTGTT GTTAACTTGT TTATTGCAGC TTATAATGGT 9840 
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TACAAATAAA GCAATAGCAT CACAAATTTC ACAAATAAAG CATTTTTTTC ACTGCATTCT 9900 

AGTTGTGGTT TGTCCAAACT CATCAATGTA TCTTATCATG TCTGGATCCC CAGGAAGCTC 9960 

CTCTGTGTCC TCATAAACCC TAACCTCCTC TACTTGAGAG GACATTCCAA TCATAGGCTG 10020 

CCCATCCACC CTCTGTGTCC TCCTGTTAAT TAGGTCACTT AACAAAAAGG AAATTGGGTA 10080 

GGGGTTTTTC ACAGACCGCT TTCTAAGGGT AATTTTAAAA TATCTGGGAA GTCCCTTCCA 10140 

CTGCTGTGTT CCAGAAGTGT TGGTAAACAG CCCACAAATG TCAACAGCAG AAACATACAA 10200 

GCTGTCAGCT TTGCACAAGG GCCCAACACC CTGCTCAGCA AGAAGCACTG TGGTTGCTGT 10260 

GTTAGTAATG TGCAAAACAG GAGGCACATT TTCCCCACCT GTGTAGGTTC CAAAATATCT 10320 

AGTGTTTTCA TTTTTACTTG GATCAGGAAC CCAGCACTCC ACTGGATAAG CATTATCCTT 10380 

ATCCAAAACA GCCTTGTGGT CAGTGTTCAT CTGCTGACTG TCAACTGTAG CATTTTTTGG 10440 

GGTTACAGTT TGAGCAGGAT ATTTGGTCCT GTAGTTTGCT AACACACCCT GCAGCTCCAA 10500 

AGGTTCCCCA CCAACAGCAA AAAAATGAAA ATTTGACCCT TGAATGGGTT TTCCAGCACC 10560 

ATTTTCATGA GTTTTTTGTG TCCCTGAATG CAAGTTTAAC ATAGCAGTTA CCCCAATAAC 10620 

CTCAGTTTTA ACAGTAACAG CTTCCCACAT CAAAATATTT CCACAGGTTA AGTCCTCATT 10680 

TAAATTAGGC AAAGGAA 10697 
(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10549 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 
AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 
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TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCC CG TGTTGACGCC GGG CAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGG GG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTG CGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 



WO 95/19987 



-106- 



PCT/US95/01153 



GGTAAGCGGC 


AGGGTCGGAA 


CAGGAGAGCG CACGAGGGAG CTTCC AGGGG 


GAAACGCCTG 


1740 


GTATCTTTAT 


AGTCCTGTCG 


GGTTTCGCCA CCTCTGACTT GAGCGTCGAT 


TTTTGTGATG 


1800 


CTCGTCAGGG 


GGGCGGAGCC 


TATGGAAAAA CGCCAGCAAC GCGGCCTTTT 


TACGGTTCCT 


1860 


GGCCTTTTGC 


TGGCCTTTTG 


CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 


1920 


TAACCGTATT 


ACCGCCTTTG 


AGTGAGCTGA TACCGCTCGC CGCAGCCGAA 


CGACCGAGCG 


1980 


CAGCGAGTCA 


GTGAGCGAGG 


AAGCGGAAGA GCGCCTGATG CGGTATTTTC 


TCCTTACGCA 


2040 


TCTGTGCGGT 


ATTTCACACC 


GCATATGGTG CACTCTCAGT ACAATCTGCT 


CTGATGCCGC 


2100 


ATAGTTAAGC 


CAGTATTCGA 


CCTCGAGGGA TCTTTGTGAA GGAACCTTAC 


TTCTGTGGTG 


2160 


TGACATAATT 


GGACAAACTA 


CCTACAGAGA TTTAAAGCTC TAAGGTAAAT 


ATAAAATTTT 


2220 


TAAGTGTATA 


ATGTGTTAAA 


CTACTGATTC TAATTGTTTG TGTATTTTAG 


ATTCCAACCT 


2280 


ATGGAACTGA 


TGAATGGGAG 


CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC 


CTGTTTTGCT 


2340 


CAGAAGAAAT 


GCCATCTAGT 


GATGATGAGG CTACTGCTGA CTCTCAACAT 


TCTACTCCTC 


2400 


CAAAAAAGAA 


GAGAAAGGTA 


GAAGACCCCA AGGACTTTCC TTCAGAATTG 


CTAAGTTTTT 


2460 


TGAGTCATGC 


TGTGTTTAGT 


AATAGAACTC TTGCTTGCTT TGCTATTTAC 


ACCACAAAGG 


2520 


AAAAAGCTGC 


ACTGCTATAC 


AAGAAAATTA TGGAAAAATA TTCTGTAACC 


TTTATAAGTA 


2580 


GGCATAACAG 


TTATAATCAT 


AACATACTGT TTTTTCTTAC TCCACACAGG 


CATAGAGTGT 


2640 


CTGCTATTAA 


TAACTATGCT 


CAAAAATTGT GTACCTTTAG CTTTTTAATT 


TGTAAAGGGG 


2700 


TTAATAAGGA 


ATATTTGATG 


TATAGTGCCT TGACTAGAGA TCATAATCAG 


CCATACCACA 


2760 


TTTGTAGAGG 


TTTTACTTGC 


TTTAAAAAAC CTCCCACACC TCCCCCTGAA 


CCTGAAACAT 


2820 


AAAATGAATG 


CAATTGTTGT 


TGTTAACTTG TTTATTGCAG CTTATAATGG 


TTACAAATAA 


2880 


AGCAATAGCA 


TCACAAATTT 


CACAAATAAA GCATTTTTTT CACTGCATTC 


TAGTTGTGGT 


2940 


TTGTCCAAAC 


TCATCAATGT 


ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 


3000 


TTAGGGTGTG 


GAAAGTCCCC 


AGGCTCCCCA GCAGGCAGAA GTATGCAAAG 


CATGCATCTC 


3060 


AATTAGTCAG 


CAACCAGGTG 


TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG 


AAGTATGCAA 


3120 


AGCATGCATC 


TCAATTAGTC 


AGCAACCATA GTCCCGCCCC TAACTCCGCC 


CATCCCGCCC 


3180 


CTAACTCCGC 


CCAGTTCCGC 


CCATTCTCCG CCCCATGGCT GACTAATTTT 


TTTTATTTAT 


3240 
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GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGGCTTTTTT 
GGAGGCCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG CGCAAGGGCT 
GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA CCCCGGATGA 
ATGTCAGCTA CTGGGCTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA AAGCAGGTAG 
CTTGCAGTGG GCTTACATGG CGATAGCTAG ACTGGGCGGT TTTATGGACA GCAAGCGAAC 
CGGAATTGCC AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA GTAAACTGGA 
TGGCTTTCTT GCCGCCAAGG ATCTGATGGC GCAGGGGATC AAGATCTGAT CAAGAGACAG 
GATGAGGATC GTTTCGCATG ATTGAACAAG ATGGATTGCA CGCAGGTTCT CCGGCCGCTT 
GGGTGGAGAG GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC TCTGATGCCG 
CCGTGTTCCG GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GACCTGTCCG 
GTGCCCTGAA TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC ACGACGGGCG 
TTCCTTGCGC AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG CTGCTATTGG 
GCGAAGTGCC GGGGCAGGAT CTCCTGTCAT CTCACCTTGC TCCTGCCGAG AAAGTATCCA 
TCATGGCTGA TGCAATGCGG CGGCTGCATA CGCTTGATCC GGCTACCTGC CCATTCGACC 
ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT CTTGTCGATC 
AGGATGATCT GGACGAAGAG CATCAGGGGC TCGCGCCAGC CGAACTGTTC GCCAGGCTCA 
AGGCGCGCAT GCCCGACGGC GAGGATCTCG TCGTGACCCA TGGCGATGCC TGCTTGCCGA 
ATATCATGGT GGAAAATGGC CGCTTTTCTG GATTCATCGA CTGTGGCCGG CTGGGTGTGG 
CGGACCGCTA TCAGGACATA GCGTTGGCTA CCCGTGATAT TGCTGAAGAG CTTGGCGGCG 
AATGGGCTGA CCGCTTCCTC GTGCTTTACG GTATCGCCGC TCCCGATTCG CAGCGCATCG 
CCTTCTATCG CCTTCTTGAC GAGTTCTTCT GAGCGGGACT CTGGGGTTCG AAATGACCGA 
CCAAGCGACG CCCAACCTGC CATCACGAGA TTTCGATTCC ACCGCCGCCT TCTATGAAAG 
GTTGGGCTTC GGAATCGTTT TCCGGGACGC CGGCTGGATG ATCCTCCAGC GCGGGGATCT 
CATGCTGGAG TTCTTCGCCC ACCCCGGGCT CGATCCCCTC GCGAGTTGGT TCAGCTGCTG 
CCTGAGGCTG GACGACCTCG CGGAGTTCTA CCGGCAGTGC AAATCCGTCG GCATCCAGGA 
AACCAGCAGC GGCTATCCGC GCATCCATGC CCCCGAACTG CAGGAGTGGG GAGGCACGAT 
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GGCCGCTTTG GTCCCGGATC TTTGTGAAGG AACCTTACTT 


CTGTGGTGTG 


ACATAATTGG 


4860 


ACAAACTACC TACAGAGATT TAAAGCTCTA AGGTAAATAT 


AAAATTTTTA 


AGTGTATAAT 


4920 


GTGTTAAACT ACTGATTCTA ATTGTTTGTG TATTTTAGAT 


TCCAACCTAT 


GGAACTGATG 


4980 


AATGGGAGCA GTGGTGG AAT GCCTTTAATG AGGAAAACCT 


GTTTTGCTCA 


GAAGAAATGC 


5040 


CATCTAGTGA TGATGAGGCT ACTGCTGACT CTCAACATTC 


TACTCCTCCA 


AAAAAGAAGA 


5100 


GAAAGGTAGA AGACCCCAAG GACTTTCCTT CAGAATTGCT 


AAGTTTTTTG 


AGTCATGCTG 


5160 


TGTTTAGTAA TAGAACTCTT GCTTGCTTTG CTATTTACAC 


CACAAAGGAA 


AAAGCTGCAC 


5220 


TGCTATACAA GAAAATTATG GAAAAATATT CTGTAACCTT 


TATAAGTAGG 


CATAACAGTT 


5280 


ATAATCATAA CATACTGTTT TTTCTTACTC CACACAGGCA 


TAGAGTGTCT 


GCTATTAATA 


5340 


ACTATGCTCA AAAATTGTGT ACCTTTAGCT TTTTAATTTG 


TAAAGGGGTT 


AATAAGGAAT 


5400 


ATTTGATGTA TAGTGCCTTG ACTAGAGATC ATAATCAGCC 


ATACCACATT 


TGTAGAGGTT 


5460 


TTACTTGCTT TAAAAAACCT CCCACACCTC CCCCTGAACC 


TGAAACATAA 


AATGAATGCA 


5520 


ATTGTTGTTG TTAACTTGTT TATTGCAGCT TATAATGGTT 


ACAAATAAAG 


CAATAGCATC 


5580 


ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA 


GTTGTGGTTT 


GTCCAAACTC 


5640 


ATCAATGTAT CTTATCATGT CTGGATCCCC AGGAAGCTCC 


TCTGTGTCCT 


CATAAACCCT 


5700 


AACCTCCTCT ACTTGAGAGG ACATTCCAAT CATAGGCTGC 


CCATCCACCC 


TCTGTGTCCT 


5760 


CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG 


GGGTTTTTCA 


CAGACCGCTT 


5820 


TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC 


TGCTGTGTTC 


CAGAAGTGTT 


5880 


GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG 


CTGTCAGCTT 


TGCACAAGGG 


5940 


CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG 


TTAGTAATGT 


GCAAAACAGG 


6000 


AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA 


GTGTTTTCAT 


TTTTACTTGG 


6060 


ATCAGGAACC CAGCACTCCA CTGGATAAGC ATTATCCTTA 


TCCAAAACAG 


CCTTGTGGTC 


6120 


AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG 


GTTACAGTTT 


GAGCAGGATA 


6180 


TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA 


GGTTCCCCAC 


CAACAGCAAA 


6240 


AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA 


TTTTCATGAG 


TTTTTTGTGT 


6300 


CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC 


TCAGTTTTAA 


CAGTAACAGC 


6360 



WO 95/19987 



PCT/US95/01153 



-109- 



TTCCCACATC 
ACACTCCGCT 
CTGACGCGCC 
TCTCCGGGAG 
GGATCATAAT 
ACCTCCCCCT 
CAGCTTATAA 
TTTCACTGCA 
TCATAATCAG 
TCCCCCTGAA 
CTTATAATGG 
CACTGCATTC 
CAAGTTCATC 
ACAGCTGTGT 
AGCAGCTGAA 
TAAAGAAAGG 
ATAAGGCTAT 
TCGAGGTGAA 
TGAAACGATA 
AATTCTTTAT 
ACATTTATAA 
TTGTTTCCAA 
AGAAAATTAT 
TCGTCACATC 
ATCGTGACAA 
GTGTGGCCCT 



AAAATATTTC 
ATCGCTACGT 
CTGACGGGCT 
CTGCATGTGT 
CAGCCATACC 
GAACCTGAAA 
TGGTTACAAA 
TTCTAGTTGT 
CCATACCACA 
CCTGAAACAT 
TTACAAATAA 
TAGTTGTGGT 
TATTTCCTCC 
TTGGCTGCAG 
TTCCAGCTGG 
CCCGGCGCCA 
GAAGAGATAC 
CATCACGTAC 
TGGGCTGAAT 
GCCGGTGTTG 
TGAACGTGAA 
AAAGGGGTTG 
TATCATGGAT 
TCATCTACCT 
AACAATTGCA 
TCCGCATAGA 



CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420 

GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG 6480 

TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540 

CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600 

ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660 

CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720 

TAAAGCAATA G CATC AC AAA TTTCACAAAT AAAGCATTTT 6780 

GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 6840 

TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 6900 

AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 6960 

AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 7020 

TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 7080 

CACATCTGGT ATAAAAGGAG GCAGTGGCCC ACAGAGGAGC 7140 

GGCCAAGAGC GCTGTCAAGA AGACCCACAC GCCCCCCTCC 7200 

CATTCCGGTA CTGTTGGTAA AATGGAAGAC GCCAAAAACA 7260 

TTCTATCCTC TAGAGGATGG AACCGCTGGA GAGCAACTGC 7320 

GCCCTGGTTC CTGGAACAAT TGCTTTTACA GATGCACATA 7380 

GCGGAATACT TCGAAATGTC CGTTCGGTTG GCAGAAGCTA 7440 

ACAAATCACA GAATCGTCGT ATGCAGTGAA AACTCTCTTC 7500 

GGCGCGTTAT TTATCGGAGT TGCAGTTGCG CCCGCGAACG 7560 

TTGCTCAACA GTATGAACAT TTCGCAGCCT ACCGTAGTGT 7620 
CAAAAAATTT TGAACGTGCA AAAAAAATTA CCAATAATCC 7680 
TCTAAAACGG ATTACCAGGG ATTTCAGTCG ATGTACACGT 7740 
CCCGGTTTTA ATGAATACGA TTTTGTACCA GAGTCCTTTG 7800 
CTGATAATGA ATTCCTCTGG ATCTACTGGG TTACCTAAGG 7860 
ACTGCCTGCG TCAGATTCTC GCATGCCAGA GATCCTATTT 7920 
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TTGGCAATCA AATCATTCCG GATACTGCGA TTTTAAGTGT TGTTCCATTC CATCACGGTT 7980 

TTGGAATGTT TACTACACTC GGATATTTGA TATGTGGATT TCGAGTCGTC TTAATGTATA 8040 

GATTTGAAGA AGAGCTGTTT TTACGATCCC TTCAGGATTA CAAAATTCAA AGTGCGTTGC 8100 

TAGTACCAAC CCTATTTTCA TTCTTCGCCA AAA GC ACT CT GATTGACAAA TACGATTTAT 8160 

CTAATTTACA CGAAATTGCT TCTGGGGGCG CACCTCTTTC GAAAGAAGTC GGGGAAGCGG 8220 

TTGCAAAACG CTTCCATCTT CCAGGGATAC GACAAGGATA TGGGCTCACT GAGACTACAT 8280 

CAGCTATTCT GATTACACCC GAGGGGGATG ATAAACCGGG CGCGGTCGGT AAAGTTGTTC 8340 

CATTTTTTGA AGCGAAGGTT GTGGATCTGG ATACCGGGAA AACGCTGGGC GTTAATCAGA 8400 

GAGGCGAATT ATGTGTCAGA GG AC CTATG A TTATGTCCGG TTATGTAAAC AATCCGGAAG 8460 

CGACCAACGC CTTGATTGAC AAGGATGGAT GG CTACATTC TG GAGA C ATA GCTTACTGGG 8520 

ACGAAGACGA ACACTTCTTC ATAGTTGACC GCTTGAAGTC TTTAATTAAA TACAAAGGAT 8580 

ATCAGGTGGC CCCCGCTGAA TTGGAATCGA TATTGTTACA ACACCCCAAC ATCTTCGACG 8640 

CGGGCGTGGC AGGTCTTCCC GACGATGACG CCGGTGAACT TCCCGCCGCC GTTGTTGTTT 8700 

TGGAGCACGG AAAGACGATG ACGGAAAAAG AGATCGTGGA TTACGTCGCC AGTCAAGTAA 8760 

CAACCGCGAA AAAGTTGCGC GGAGGAGTTG TGTTTGTGGA CGAAGTACCG AAAGGTCTTA 8820 

CCGGAAAACT CGACGCAAGA AAAATCAGAG AGATCCTCAT AAAGGCCAAG AAGGGCGGAA 8880 

AGTCCAAATT GTAAAATGTA ACTGTATTCA GCGATGACGA AATTCTTAGC TATTGTAATG 8940 

ACTCTAGAGG ATCTTTGTGA AGGAACCTTA CTTCTGTGGT GTGACATAAT TGGACAAACT 9000 

ACCTACAGAG ATTTAAAGCT CTAAGGTAAA TATAAAATTT TTAAGTGTAT AATGTGTTAA 9060 

ACTACTGATT CTAATTGTTT GTGTATTTTA GATTCCAACC TATGGAACTG ATGAATGGGA 9120 

GCAGTGGTGG AATGCCTTTA ATGAGGAAAA CCTGTTTTGC TCAGAAGAAA TGCCATCTAG 9180 

TGATGATGAG GCTACTGCTG ACTCTCAACA TTCTACTCCT CCAAAAAAGA AGAGAAAGGT 9240 

AGAAGACCCC AAGGACTTTC CTTCAGAATT GCTAAGTTTT TTGAGTCATG CTGTGTTTAG 9300 

TAATAGAACT CTTGCTTGCT TTGCTATTTA CACCACAAAG GAAAAAGCTG CACTGCTATA 9360 

CAAGAAAATT ATGGAAAAAT ATTCTGTAAC CTTTATAAGT AGGCATAACA GTTATAATCA 9420 

TAACATACTG TTTTTTCTTA CTCCACACAG GCATAGAGTG TCTGCTATTA ATAACTATGC 9480 
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TCAAAAATTG TGTACCTTTA GCTTTTTAAT TTGTAAAGGG 
GTATAGTGCC TTGACTAGAG ATCATAATCA GCCATACCAC 
CTTTAAAAAA CCTCCCACAC CTCCCCCTGA ACCTGAAACA 
TTGTTAACTT GTTTATTGCA GCTTATAATG GTTACAAATA 
TCACAAATAA AGCATTTTTT TCACTGCATT CTAGTTGTGG 
TATCTTATCA TGTCTGGATC CCCAGGAAGC TCCTCTGTGT 
TCTACTTGAG AGGACATTCC AATCATAGGC TGCCCATCCA 
ATTAGGTCAC TTAACAAAAA GGAAATTGGG TAGGGGTTTT 
GTAATTTTAA AATATCTGGG AAGTCCCTTC CACTGCTGTG 
AGCCCACAAA TGTCAACAGC AGAAACATAC AAGCTGTCAG 
CCCTGCTCAG CAAGAAGCAC TGTGGTTGCT GTGTTAGTAA 
TTTTCCCCAC CTGTGTAGGT TCCAAAATAT CTAGTGTTTT 
ACCCAGCACT CCACTGGATA AGCATTATCC TTATCCAAAA 
ATCTGCTGAC TGTCAACTGT AGCATTTTTT GGGGTTACAG 
CTGTAGTTTG CTAACACACC CTGCAGCTCC AAAGGTTCCC 
AAATTTGACC CTTGAATGGG TTTTCCAGCA CCATTTTCAT 
TGCAAGTTTA ACATAGCAGT TACCCCAATA ACCTCAGTTT 
ATCAAAATAT TTCCACAGGT TAAGTCCTCA TTTAAATTAG 
(2) INFORMATION FOR SEQ ID N0:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10558 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 



GTTAATAAGG 
ATTTGTAGAG 
TAAAATGAAT 
AAGCAATAGC 
TTTGTCCAAA 
CCTCATAAAC 
CCCTCTGTGT 
TCACAGACCG 
TTCCAGAAGT 
CTTTGCACAA 
TGTGCAAAAC 
CATTTTTACT 
CAGCCTTGTG 
TTTGAGCAGG 
CACCAACAGC 
GAGTTTTTTG 
TAACAGTAAC 
GCAAAGGAA 



AATATTTGAT 
GTTTTACTTG 
GCAATTGTTG 
ATCACAAATT 
CTCATCAATG 
CCTAACCTCC 
CCTCCTGTTA 
CTTTCTAAGG 
GTTGGTAAAC 
GGGCCCAACA 
AGGAGGCACA 
TGGATCAGGA 
GTCAGTGTTC 
ATATTTGGTC 
AAAAAAATGA 
TGTCCCTGAA 
AGCTTCCCAC 



9540 
9600 
9660 
9720 
9780 
9840 
9900 
9960 
10020 
10080 
10140 
10200 
10260 
10320 
10380 
10440 
10500 
10549 
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(xi) SEQUENCE DESCRIPTION: SEQ ID N0:4: 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTT1TGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATAC C AAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 
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TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGC CTTTT TACGGTTCCT 1860 

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100 

ATAGTTAAGC CAGTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 2160 

TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT 2220 

TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT 2280 

ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT 2340 

CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC 2400 

CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT 2460 

TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 2520 

AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 2580 

GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT 2640 

CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG 2700 

TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 2760 

TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 2820 

AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA 2880 

AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT 2940 
TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 3000 
TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTG 3060 
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AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG AAGTATGCAA 3120 

AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC 3180 

CTAACTCCGC CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 3240 

GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGGCTTTTTT 3300 

GGAGGCCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG CGCAAGGGCT 3360 

GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA CCCCGGATGA 3420 

ATGTCAGCTA CTGGGCTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA AAGCAGGTAG 3480 

CTTGCAGTGG GCTTACATGG CGATAGCTAG ACTGGGCGGT TTTATGGACA GCAAGCGAAC 3540 

CGGAATTGCC AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA GTAAACTGGA 3600 

TGGCTTTCTT GCCGCCAAGG ATCTGATGGC GCAGGGGATC AAGATCTGAT CAAGAGACAG 3660 

GATGAGGATC GTTTCGCATG ATTGAACAAG ATGGATTGCA CGCAGGTTCT CCGGCCGCTT 3720 

GGGTGGAGAG GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC TCTGATGCCG 3780 

CCGTGTTCCG GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GACCTGTCCG 3840 

GTGCCCTGAA TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC ACGACGGGCG 3900 

TTCCTTGCGC AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG CTGCTATTGG 3960 

GCGAAGTGCC GGGGCAGGAT CTCCTGTCAT CTCACCTTGC TCCTGCCGAG AAAGTATCCA 4020 

TCATGGCTGA TGCAATGCGG CGGCTGCATA CGCTTGATCC GGCTACCTGC CCATTCGACC 4080 

ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT CTTGTCGATC 4140 

AGGATGATCT GGACGAAGAG CATCAGGGGC TCGCGCCAGC CGAACTGTTC GCCAGGCTCA 4200 

AGGCGCGCAT GCCCGACGGC GAGGATCTCG TCGTGACCCA TGGCGATGCC TGCTTGCCGA 4260 

ATATCATGGT GGAAAATGGC CGCTTTTCTG GATTCATCGA CTGTGGCCGG CTGGGTGTGG 4320 

CGGACCGCTA TCAGGACATA GCGTTGGCTA CCCGTGATAT TGCTGAAGAG CTTGGCGGCG 4380 

AATGGGCTGA CCGCTTCCTC GTGCTTTACG GTATCGCCGC TCCCGATTCG CAGCGCATCG 4440 

CCTTCTATCG CCTTCTTGAC GAGTTCTTCT GAGCGGGACT CTGGGGTTCG AAATGACCGA 4500 

CCAAGCGACG CCCAACCTGC CATCACGAGA TTTCGATTCC ACCGCCGCCT TCTATGAAAG 4560 

GTTGGGCTTC GGAATCGTTT TCCGGGACGC CGGCTGGATG ATCCTCCAGC GCGGGGATCT 4620 
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CATGCTGGAG TTCTTCGCCC ACCCCGGGCT CGATCCCCTC GCGAGTTGGT TCAGCTGCTG 4680 

CCTGAGGCTG GACGACCTCG CGGAGTTCTA CCGGCAGTGC AAATCCGTCG GCATCCAGGA 4740 

AACCAGCAGC GGCTATCCGC GCATCCATGC CCCCGAACTG CAGGAGTGGG GAGGCACGAT 4800 

GGCCGCTTTG GTCCCGGATC TTTGTGAAGG AACCTTACTT CTGTGGTGTG ACATAATTGG 4860 

ACAAACTACC TACAGAGATT TAAAGCTCTA AGGTAAATAT AAAATTTTTA AGTGTATAAT 4920 

GTGTTAAACT ACTGATTCTA ATTGTTTGTG TATTTTAGAT TCCAACCTAT GGAACTGATG 4980 

AATGGGAGCA GTGGTGGAAT GCCTTTAATG AGGAAAACCT GTTTTGCTCA GAAGAAATGC 5040 

CATCTAGTGA TGATGAGGCT ACTGCTGACT CTCAACATTC TACTCCTCCA AAAAAGAAGA 5100 

GAAAGGTAGA AGACCCCAAG GACTTTCCTT CAGAATTGCT AAGTTTTTTG AGTCATGCTG 5160 

TGTTTAGTAA TAGAACTCTT GCTTGCTTTG CTATTTACAC CACAAAGGAA AAAGCTGCAC 5220 

TGCTATACAA GAAAATTATG GAAAAATATT CTGTAACCTT TATAAGTAGG CATAACAGTT 5280 

ATAATCATAA CATACTGTTT TTTCTTACTC CACACAGGCA TAGAGTGTCT GCTATTAATA 5340 

ACTATGCTCA AAAATTGTGT ACCTTTAGCT TTTTAATTTG TAAAGGGGTT AATAAGGAAT 5400 

ATTTGATGTA TAGTGCCTTG ACTAGAGATC ATAATCAGCC ATACCACATT TGTAGAGGTT 5460 

TTACTTGCTT TAAAAAACCT CCCACACCTC CCCCTGAACC TGAAACATAA AATGAATGCA 5520 

ATTGTTGTTG TTAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG CAATAGCATC 5580 

ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC 5640 

ATCAATGTAT CTTATCATGT CTGGATCCCC AGGAAGCTCC TCTGTGTCCT CATAAACCCT 5700 

AACCTCCTCT ACTTGAGAGG ACATTCCAAT CATAGGCTGC CCATCCACCC TCTGTGTCCT 5760 

CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG GGGTTTTTCA CAGACCGCTT 5820 

TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT 5880 

GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG CTGTCAGCTT TGCACAAGGG 5940 

CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG TTAGTAATGT GCAAAACAGG 6000 

AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA GTGTTTTCAT TTTTACTTGG 6060 

ATCAGGAACC CAGCACTCCA CTGGATAAGC ATTATCCTTA TCCAAAACAG CCTTGTGGTC 6120 

AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG GTTACAGTTT GAGCAGGATA 6180 
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TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 6240 

AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT 6300 

CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA CAGTAACAGC 6360 

TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420 

ACACTCCGCT ATCGCTACGT GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG 6480 

CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540 

TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600 

GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660 

ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720 

CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT 6780 

TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 6840 

TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 6900 

TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 6960 

CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 7020 

CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 7080 

CAGTGGGGAG TCAGCCGTGT ATCATCGCCC ACATCTGGTA TAAAAGGAGG CAGTGGCCCA 7140 

CAGAGGAGCA CAGCTGTGTT TGGCTGCAGG GCCAAGAGCG CTGTCAAGAA GACCCACACG 7200 

CCCCCCTCCA GCAGCTGAAT TCCAGCTGGC ATTCCGGTAC TGTTGGTAAA ATGGAAGACG 7260 

CCAAAAACAT AAAGAAAGGC CCGGCGCCAT TCTATCCTCT AGAGGATGGA ACCGCTGGAG 7320 

AGCAACTGCA TAAGGCTATG AAGAGATACG CCCTGGTTCC TGGAACAATT GCTTTTACAG ' 7380 

ATGCACATAT CGAGGTGAAC ATCACGTACG CGGAATACTT CGAAATGTCC GTTCGGTTGG 7440 

CAGAAGCTAT GAAACGATAT GGG CTGAATA CAAATCACAG AATCGTCGTA TGCAGTGAAA 7500 

ACTCTCTTCA ATTCTTTATG CCGGTGTTGG GCGCGTTATT TATCGGAGTT GCAGTTGCGC 7560 

CCGCGAACGA CATTTATAAT GAACGTGAAT TGCTCAACAG TATGAACATT TCGCAGCCTA 7620 

CCGTAGTGTT TGTTTCCAAA AAGGGGTTGC AAAAAATTTT GAACGTGCAA AAAAAATTAC 7680 

CAATAATCCA GAAAATTATT ATCATGGATT CTAAAACGGA TTACCAGGGA TTTCAGTCGA 7740 
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TGTACACGTT CGTCACATCT CATCTACCTC CCGGTTTTAA TGAATACGAT TTTGTACCAG 7800 

AGTCCTTTGA TCGTGACAAA ACAATTGCAC TGATAATGAA TTCCTCTGGA TCTACTGGGT 7860 

TACCTAAGGG TGTGGCCCTT CCGCATAGAA CTGCCTGCGT CAGATTCTCG CATGCCAGAG 7920 

ATCCTATTTT TGGCAATCAA ATCATTCCGG ATACTGCGAT TTTAAGTGTT GTTCCATTCC 7980 

ATCACGGTTT TGGAATGTTT ACTACACTCG GATATTTGAT ATGTGGATTT CGAGTCGTCT 8040 

TAATGTATAG ATTTGAAGAA GAGCTGTTTT TACGATCCCT TCAGGATTAC AAAATTCAAA 8100 

GTGCGTTGCT AGTACCAACC CTATTTTCAT TCTTCGCCAA AAGCACTCTG ATTGACAAAT 8160 

ACGATTTATC TAATTTACAC GAAATTGCTT CTGGGGGCGC ACCTCTTTCG AAAGAAGTCG 8220 

GGGAAGCGGT TGCAAAACGC TTCCATCTTC CAGGGATACG ACAAGGATAT GGGCTCACTG 8280 

AGACTACATC AGCTATTCTG ATTACACCCG AGGGGGATGA TAAACCGGGC GCGGTCGGTA 8340 

AAGTTGTTCC ATTTTTTGAA GCGAAGGTTG TGGATCTGGA TACCGGGAAA ACGCTGGGCG 8400 

TTAATCAGAG AGGCGAATTA TGTGTCAGAG GACCTATGAT TATGTCCGGT TATGTAAACA 8460 

ATCCGGAAGC GACCAACGCC TTGATTGACA AGGATGGATG GCTACATTCT GGAGACATAG 8520 

CTTACTGGGA CGAAGACGAA CACTTCTTCA TAGTTGACCG CTTGAAGTCT TTAATTAAAT 8580 

ACAAAGGATA TCAGGTGGCC CCCGCTGAAT TGGAATCGAT ATTGTTACAA CACCCCAACA 8640 

TCTTCGACGC GGGCGTGGCA GGTCTTCCCG ACGATGACGC CGGTGAACTT CCCGCCGCCG 8700 

TTGTTGTTTT GGAGCACGGA AAGACGATGA CGGAAAAAGA GATCGTGGAT TACGTCGCCA 8760 

GTCAAGTAAC AACCGCGAAA AAGTTGCGCG GAGGAGTTGT GTTTGTGGAC GAAGTACCGA 8820 

AAGGTCTTAC CGGAAAACTC GACGCAAGAA AAATCAGAGA GATCCTCATA AAGGCCAAGA 8880 

AGGGCGGAAA GTCCAAATTG TAAAATGTAA CTGTATTCAG CGATGACGAA ATTCTTAGCT 8940 

ATTGTAATGA CTCTAGAGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG TGACATAATT 9000 

GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT TAAGTGTATA 9060 

ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT ATGGAACTGA 9120 

TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT CAGAAGAAAT 9180 

GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC CAAAAAAGAA 9240 

GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT TGAGTCATGC 9300 
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TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG AAAAAGCTGC 9360 

ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA GG CAT AACAG 9420 

TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT CTGCTATTAA 9480 

TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG TTAATAAGGA 9540 

ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA TTTGTAGAGG 9600 

TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT AAAATGAATG 9660 

CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA AGCAATAGCA 9720 

TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT TTGTCCAAAC 9780 

TCATCAATGT ATCTTATCAT GTCTGGATCC CCAGGAAGCT CCTCTGTGTC CTCATAAACC 9840 

CTAACCTCCT CTACTTGAGA GGACATTCCA ATCATAGGCT GCCCATCCAC CCTCTGTGTC 9900 

CTCCTGTTAA TTAGGTCACT TAACAAAAAG GAAATTGGGT AGGGGTTTTT CACAGACCGC 9960 

TTTCTAAGGG TAATTTTAAA ATATCTGGGA AGTCCCTTCC ACTGCTGTGT TCCAGAAGTG 10020 

TTGGTAAACA GCCCACAAAT GTCAACAGCA GAAACATACA AGCTGTCAGC TTTGCACAAG 10080 

GGCCCAACAC CCTGCTCAGC AAGAAGCACT GTGGTTGCTG TGTTAGTAAT GTGCAAAACA 10140 

GGAGG CACAT TTTCCCCACC TGTGTAGGTT CCAAAATATC TAGTGTTTTC ATTTTTACTT 10200 

GGATCAGGAA CCCAGCACTC CACTGGATAA GCATTATCCT TATCCAAAAC AGCCTTGTGG 10260 

TCAGTGTTCA TCTGCTGACT GTCAACTGTA GCATTTTTTG GGGTTACAGT TTGAGCAGGA 10320 
TATTTGGTCC TGTAGTTTGC TAACACACCC TGCAGCTCCA AAGGTTCCCC ACCAACAGCA ' 10380 

AAAAAATGAA AATTTGACCC TTGAATGGGT TTTCCAGCAC CATTTTCATG AGTTTTTTGT 10440 

GTCCCTGAAT GCAAGTTTAA CATAGCAGTT ACCCCAATAA CCTCAGTTTT AACAGTAACA 10500 

GCTTCCCACA TCAAAATATT TCCACAGGTT AAGTCCTCAT TTAAATTAGG CAAAGGAA 10558 
(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10569 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: circular 



(ii) MOLECULE TYPE: DNA (genomic) 
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(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 
TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 
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TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC 


AGCAGAGCGC 


AGATACCAAA 


1380 


TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC 


AAGAACTCTG 


TAGCACCGCC 


1440 


TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT 


GCCAGTGGCG 


ATAAGTCGTG 


1500 


TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG 


GCGCAGCGGT 


CGGGCTGAAC 


1560 


GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC 


TACACCGAAC 


TGAGATACCT 


1620 


ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG 


AGAAAGGCGG 


ACAGGTATCC 


1680 


GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG 


CTTCCAGGGG 


GAAACGCCTG 


1740 


GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT 


GAGCGTCGAT 


TTTTGTGATG 


1800 


CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC 


GCGGCCTTTT 


TACGGTTCCT 


1860 


GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG 


TTATCCCCTG 


ATTCTGTGGA 


1920 


TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC 


CGCAGCCGAA 


CGACCGAGCG 


1980 


CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG 


CGGTATTTTC 


TCCTTACGCA 


2040 


TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT 


ACAATCTGCT 


CTGATGCCGC 


2100 


ATAGTTAAGC CAGTATTCGA CCTCGAGGGA TCTTTGTGAA 


GGAACCTTAC 


TTCTGTGGTG 


2160 


TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC 


TAAGGTAAAT 


ATAAAATTTT 


2220 


TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG 


TGTATTTTAG 


ATTCCAACCT 


2280 


ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA 


TGAGGAAAAC 


CTGTTTTGCT 


2340 


CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA 


CTCTCAACAT 


TCTACTCCTC 


2400 


CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC 


TTCAGAATTG 


CTAAGTTTTT 


2460 


TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT 


TGCTATTTAC 


ACCACAAAGG 


2520 


AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA 


TTCTGTAACC 


TTTATAAGTA 


2580 


GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC 


TCCACACAGG 


CATAGAGTGT 


2640 


CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG 


CTTTTTAATT 


TGTAAAGGGG 


2700 


TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA 


TCATAATCAG 


CCATACCACA 


2760 


TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 


TCCCCCTGAA 


CCTGAAACAT 


2820 


AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 


CTTATAATGG 


TTACAAATAA 


2880 
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AGCAATAGCA 


TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT 


2940 


TTGTCCAAAC 


TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 


3000 


TTAGGGTGTG 


GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 


3060 


AATTAGTCAG 


CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG AAGTATGCAA 


3120 


AGCATGCATC 


TCAATTAGTC AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC 


3180 


CTAACTCCGC 


CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 


3240 


GCAGAGGCCG 


AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGGCTTTTTT 


3300 


GGAGG CCTAG 


GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG CGCAAGGGCT 


3360 


GCTAAAGGAA 


GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA CCCCGGATGA 


3420 


ATGTCAGCTA 


CTGGGCTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA AAGCAGGTAG 


3480 


CTTGCAGTGG 


GCTTACATGG CGATAGCTAG ACTGGGCGGT TTTATGGACA GCAAGCGAAC 


3540 


CGGAATTGCC 


AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA GTAAACTGGA 


3600 


TGGCTTTCTT 


GCCGCCAAGG ATCTGATGGC GCAGGGGATC AAGATCTGAT CAAGAGACAG 


3660 


GATGAGGATC 


GTTTCGCATG ATTGAACAAG ATGGATTGCA CGCAGGTTCT CCGGCCGCTT 


3720 


GGGTGGAGAG 


GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC TCTGATGCCG 


3780 


CCGTGTTCCG 


GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GACCTGTCCG 


3840 


GTGCCCTGAA 


TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC ACGACGGGCG 


3900 


TTCCTTGCGC 


AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG CTGCTATTGG 


3960 


GCGAAGTGCC 


GGGGCAGGAT CTCCTGTCAT CTCACCTTGC TCCTGCCGAG AAAGTATCCA 


4020 


TCATGGCTGA 


TGCAATGCGG CGGCTGCATA CGCTTGATCC GGCTACCTGC CCATTCGACC 


4080 


ACCAAGCGAA 


ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT CTTGTCGATC 


4140 


AGGATGATCT 


GGACGAAGAG CATCAGGGGC TCGCGCCAGC CGAACTGTTC GCCAGGCTCA 


4200 


AGGCGCGCAT 


GCCCGACGGC GAGGATCTCG TCGTGACCCA TGGCGATGCC TGCTTGCCGA 


4260 


ATATCATGGT 


GGAAAATGGC CGCTTTTCTG GATTCATCGA CTGTGGCCGG CTGGGTGTGG 


4320 


CGGACCGCTA 


TCAGGACATA GCGTTGGCTA CCCGTGATAT TGCTGAAGAG CTTGGCGGCG 


4380 


AATGGGCTGA 


CCGCTTCCTC GTGCTTTACG GTATCGCCGC TCCCGATTCG CAGCGCATCG 


4440 
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CCTTCTATCG CCTTCTTGAC GAGTTCTTCT GAGCGGGACT CTGGGGTTCG AAATGACCGA 4500 

CCAAGCGACG CCCAACCTGC CATCACGAGA TTTCGATTCC ACCGCCGCCT TCTATGAAAG 4560 

GTTGGGCTTC GGAATCGTTT TCCGGGACGC CGGCTGGATG ATCCTCCAGC GCGGGGATCT 4620 

CATGCTGGAG TTCTTCGCCC ACCCCGGGCT CGATCCCCTC GCGAGTTGGT TCAGCTGCTG 4680 

CCTGAGGCTG GACGACCTCG CGGAGTTCTA CCGGCAGTGC AAATCCGTCG GCATCCAGGA 4740 

AACCAGCAGC GGCTATCCGC GCATCCATGC CCCCGAACTG CAGGAGTGGG GAGGCACGAT 4800 

GGCCGCTTTG GTCCCGGATC TTTGTGAAGG AACCTTACTT CTGTGGTGTG ACATAATTGG 4860 

ACAAACTACC TACAGAGATT TAAAGCTCTA AGGTAAATAT AAAATTTTTA AGTGTATAAT 4920 

GTGTTAAACT ACTGATTCTA ATTGTTTGTG TATTTTAGAT TCCAACCTAT GGAACTGATG 4980 

AATGGGAGCA GTGGTGGAAT GCCTTTAATG AGGAAAACCT GTTTTGCTCA GAAGAAATGC 5040 

CATCTAGTGA TGATGAGGCT ACTGCTGACT CTCAACATTC TACTCCTCCA AAAAAGAAGA 5100 

GAAAGGTAGA AGACCCCAAG GACTTTCCTT CAGAATTGCT AAGTTTTTTG AGTCATGCTG 5160 

TGTTTAGTAA TAGAACTCTT GCTTGCTTTG CTATTTACAC CACAAAGGAA AAAGCTGCAC 5220 

TGCTATACAA GAAAATTATG GAAAAATATT CTGTAACCTT TATAAGTAGG CATAACAGTT 5280 

ATAATCATAA CATACTGTTT TTTCTTACTC CACACAGGCA TAGAGTGTCT GCTATTAATA 5340 

ACTATGCTCA AAAATTGTGT ACCTTTAGCT TTTTAATTTG TAAAGGGGTT AATAAGGAAT 5400 

ATTTGATGTA TAGTGCCTTG ACTAGAGATC ATAATCAGCC ATACCACATT TGTAGAGGTT 5460 

TTACTTGCTT TAAAAAACCT CCCACACCTC CCCCTGAACC TGAAACATAA AATGAATGCA 5520 

ATTGTTGTTG TTAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG CAATAGCATC 5580 

ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC 5640 

ATCAATGTAT CTTATCATGT CTGGATCCCC AGGAAGCTCC TCTGTGTCCT CATAAACCCT 5700 

AACCTCCTCT ACTTGAGAGG ACATTCCAAT CATAGGCTGC CCATCCACCC TCTGTGTCCT 5760 

CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG GGGTTTTTCA CAGACCGCTT 5820 

TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT 5880 

GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG CTGTCAGCTT TGCACAAGGG 5940 

CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG TTAGTAATGT GCAAAACAGG 6000 
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AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA GTGTTTTCAT TTTTACTTGG 6060 

ATCAGGAACC CAGCACTCCA CTGGATAAGC ATTATCCTTA TCCAAAACAG CCTTGTGGTC 6120 

AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG GTTACAGTTT GAGCAGGATA 6180 

TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 6240 

AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT 6300 

CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA CAGTAACAGC 6360 

TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420 

ACACTCCGCT ATCGCTACGT GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG 6480 

CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540 

TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600 

GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660 

ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720 

CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT 6780 

TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 6840 

TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 6900 

TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 6960 

CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 7020 

CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 7080 

CACTCCAACC TCAGCCAGAC AAGGTTGTTG ACACAAGACC CACATCTGGT ATAAAAGGAG 7140 

GCAGTGGCCC ACAGAGGAGC ACAGCTGTGT TTGGCTGCAG GGCCAAGAGC GCTGTCAAGA 7200 

AGACCCACAC GCCCCCCTCC AGCAGCTGAA TTCCAGCTGG CATTCCGGTA CTGTTGGTAA 7260 

AATGGAAGAC GCCAAAAACA TAAAGAAAGG CCCGGCGCCA TTCTATCCTC TAGAGGATGG 7320 

AACCGCTGGA GAGCAACTGC ATAAGGCTAT GAAGAGATAC GCCCTGGTTC CTGGAACAAT 7380 

TGCTTTTACA GATGCACATA TCGAGGTGAA CATCACGTAC GCGGAATACT TCGAAATGTC 7440 

CGTTCGGTTG GCAGAAGCTA TGAAACGATA TGGGCTGAAT ACAAATCACA GAATCGTCGT 7500 

ATGCAGTGAA AACTCTCTTC AATTCTTTAT GCCGGTGTTG GGCGCGTTAT TTATCGGAGT 7560 
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TGCAGTTGCG CCCGCGAACG ACATTTATAA TGAACGTGAA TTGCTCAACA GTATGAACAT 7620 

TTCGCAGCCT ACCGTAGTGT TTGTTTCCAA AAAGGGGTTG CAAAAAATTT TGAACGTGCA 7680 

AAAAAAATTA CCAATAATCC AGAAAATTAT TATCATGGAT TCTAAAACGG ATTACCAGGG 7740 

ATTTCAGTCG ATGTACACGT TCGTCACATC TCATCTACCT CCCGGTTTTA ATGAATACGA 7800 

TTTTGTACCA GAGTCCTTTG ATCGTGACAA AACAATTGCA CTGATAATGA ATTCCTCTGG 7860 

ATCTACTGGG TTACCTAAGG GTGTGGCCCT TCCGCATAGA ACTGCCTGCG TCAGATTCTC 7920 

GCATGCCAGA GATCCTATTT TTGGCAATCA AATCATTCCG GATACTGCGA TTTTAAGTGT 7980 

TGTTCCATTC CATCACGGTT TTGGAATGTT TACTACACTC GGATATTTGA TATGTGGATT 8040 

TCGAGTCGTC TTAATGTATA GATTTGAAGA AGAGCTGTTT TTACGATCCC TTCAGGATTA 8100 

CAAAATTCAA AGTGCGTTGC TAGTACCAAC CCTATTTTCA TTCTTCGCCA AAAGCACTCT 8160 

GATTGACAAA TACGATTTAT CTAATTTACA CGAAATTGCT TCTGGGGGCG CACCTCTTTC 8220 

GAAAGAAGTC GGGGAAGCGG TTGCAAAACG CTTCCATCTT CCAGGGATAC GACAAGGATA 8280 

TGGGCTCACT GAGACTACAT CAGCTATTCT GATTACACCC GAGGGGGATG ATAAACCGGG 8340 

CGCGGTCGGT AAAGTTGTTC CATTTTTTGA AGCGAAGGTT GTGGATCTGG ATACCGGGAA 8400 

AACGCTGGGC GTTAATCAGA GAGGCGAATT ATGTGTCAGA GGACCTATGA TTATGTCCGG 8460 

TTATGTAAAC AATCCGGAAG CGACCAACGC CTTGATTGAC AAGGATGGAT GGCTACATTC 8520 

TGGAGACATA GCTTACTGGG ACGAAGACGA ACACTTCTTC ATAGTTGACC GCTTGAAGTC 8580 

TTTAATTAAA TACAAAGGAT ATCAGGTGGC CCCCGCTGAA TTGGAATCGA TATTGTTACA 8640 

ACACCCCAAC ATCTTCGACG CGGGCGTGGC AGGTCTTCCC GACGATGACG CCGGTGAACT 8700 

TCCCGCCGCC GTTGTTGTTT TGGAGCACGG AAAGACGATG ACGGAAAAAG AGATCGTGGA 8760 

TTACGTCGCC AGTCAAGTAA CAACCGCGAA AAAGTTGCGC GGAGGAGTTG TGTTTGTGGA 8820 

CGAAGTACCG AAAGGTCTTA CCGGAAAACT CGACGCAAGA AAAATCAGAG AGATCCTCAT 8880 

AAAGGCCAAG AAGGGCGGAA AGTCCAAATT GTAAAATGTA ACTGTATTCA GCGATGACGA 8940 

AATTCTTAGC TATTGTAATG ACTCTAGAGG ATCTTTGTGA AGGAACCTTA CTTCTGTGGT 9000 

GTGACATAAT TGGACAAACT ACCTACAGAG ATTTAAAGCT CTAAGGTAAA TATAAAATTT 9060 

TTAAGTGTAT AATGTGTTAA ACTACTGATT CTAATTGTTT GTGTATTTTA GATTCCAACC 9120 
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TATGGAACTG ATGAATGGGA GCAGTGGTGG 
TCAGAAGAAA TGC CATC TAG TGATGATGAG 
CCAAAAAAGA AGAGAAAGGT AGAAGACCCC 
TTGAGTCATG CTGTGTTTAG TAATAGAACT 
GAAAAAGCTG CACTGCTATA CAAGAAAATT 
AGGCATAACA GTTATAATCA TAACATACTG 
TCTGCTATTA ATAACTATGC TCAAAAATTG 
GTTAATAAGG AATATTTGAT GTATAGTGCC 
ATTTGTAGAG GTTTTACTTG CTTTAAAAAA 
TAAAATGAAT GCAATTGTTG TTGTTAACTT 
AAGCAATAGC ATCACAAATT TCACAAATAA 
TTTGTCCAAA CTCATCAATG TATCTTATCA 
CCTCATAAAC CCTAACCTCC TCTACTTGAG 
CCCTCTGTGT CCTCCTGTTA ATTAGGTCAC 
TCACAGACCG CTTTCTAAGG GTAATTTTAA 
TTCCAGAAGT GTTGGTAAAC AGCCCACAAA 
CTTTGCACAA GGGCCCAACA CCCTGCTCAG 
TGTGCAAAAC AGGAGGCACA TTTTCCCCAC 
CATTTTTACT TGGATCAGGA ACCCAGCACT 
CAGCCTTGTG GTCAGTGTTC ATCTGCTGAC 
TTTGAGCAGG ATATTTGGTC CTGTAGTTTG 
CACCAACAGC AAAAAAATGA AAATTTGACC 
GAGTTTTTTG TGTCCCTGAA TGCAAGTTTA 
TAACAGTAAC AGCTTCCCAC ATCAAAATAT 
GCAAAGGAA 

(2) INFORMATION FOR SEQ ID NO: 6: 



AATGCCTTTA 
GCTACTGCTG 
AAGGACTTTC 
CTTGCTTGCT 
ATGGAAAAAT 
TTTTTTCTTA 
TGTACCTTTA 
TTGACTAGAG 
CCTCCCACAC 
GTTTATTGCA 
AGCATTTTTT 
TGTCTGGATC 
AGGACATTCC 
TTAACAAAAA 
AATATCTGGG 
TGTCAACAGC 
CAAGAAGCAC 
CTGTGTAGGT 
CCACTGGATA 
TGTCAACTGT 
CTAACACACC 
CTTGAATGGG 
ACATAGCAGT 
TTCCACAGGT 



ATGAGGAAAA 
ACTCTCAACA 
CTTCAGAATT 
TTGCTATTTA 
ATTCTGTAAC 
CTCCACACAG 
GCTTTTTAAT 
ATCATAATCA 
CTCCCCCTGA 
GCTTATAATG 
TCACTGCATT 
CCCAGGAAGC 
AATCATAGGC 
GGAAATTGGG 
AAGTCCCTTC 
AGAAACATAC 
TGTGGTTGCT 
TCCAAAATAT 
AGCATTATCC 
AGCATTTTTT 
CTGCAGCTCC 
TTTTCCAGCA 
TACCCCAATA 
TAAGTCCTCA 



CCTGTTTTGC 
TTCTACTCCT 
GCTAAGTTTT 
CACCACAAAG 
CTTTATAAGT 
GCATAGAGTG 
TTGTAAAGGG 
GCCATACCAC 
ACCTGAAACA 
GTTACAAATA 
CTAGTTGTGG 
TCCTCTGTGT 
TGCCCATCCA 
TAGGGGTTTT 
CACTGCTGTG 
AAGCTGTCAG 
GTGTTAGTAA 
CTAGTGTTTT 
TTATCCAAAA 
GGGGTTACAG 
AAAGGTTCCC 
CCATTTTCAT 
ACCTCAGTTT 
TTTAAATTAG 



9180 
9240 
9300 
9360 
9420 
9480 
9540 
9600 
9660 
9720 
9780 
9840 
9900 
9960 
10020 
10080 
10140 
10200 
10260 
10320 
10380 
10440 
10500 
10560 
10569 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10558 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 
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AGTTTACTCA 
GGTGAAGATC 
CTGAGCGTCA 
CGTAATCTGC 
TCAAGAGCTA 
TACTGTCCTT 
TACATACCTC 
TCTTACCGGG 
GGGGGGTTCG 
ACAGCGTGAG 
GGTAAGCGGC 
GTATCTTTAT 
CTCGTCAGGG 
GGCCTTTTGC 
TAACCGTATT 
CAGCGAGTCA 
TCTGTGCGGT 
ATAGTTAAGC 
TGACATAATT 
TAAGTGTATA 
ATGGAACTGA 
CAGAAGAAAT 
CAAAAAAGAA 
TGAGTCATGC 
AAAAAGCTGC 
GGCATAACAG 



TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 
CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 
GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 
TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 
CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 
CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 
GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 
TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 
TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 
CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 
AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCC AGGGG GAAACGCCTG 
AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 
GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 
TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 
ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 
GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 
ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 
CAGTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 
GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT 
ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT 
TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT 
GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC 
GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT 
TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 
ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 
TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT 



1140 

1200 

1260 

1320 

1380 

1440 

1500 

1560 

1620 

1680 

1740 

1800 

1860 

1920 

1980 

2040 

2100 

2160 

2220 

2280 

2340 

2400 

2460 

2520 

2580 

2640 
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CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG 2700 

TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 2760 

TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 2820 

AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA 2880 

AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT- 2940 

TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 3000 

TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 3060 

AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCaGGCTCCC CAGCAGGCAG AAGTATGCAA 3120 

AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC 3180 

CTAACTCCGC CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 3240 

GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGGCTTTTTT 3300 

GGAGGCCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG CGCAAGGGCT 3360 

GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA CCCCGGATGA 3420 

ATGTCAGCTA CTGGGCTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA AAGCAGGTAG 3480 

CTTGCAGTGG GCTTACATGG CGATAGCTAG ACTGGGCGGT TTTATGGACA GCAAGCGAAC 3540 

CGGAATTGCC AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA GTAAACTGGA 3600 

TGGCTTTCTT GCCGCCAAGG ATCTGATGGC GCAGGGGATC AAGATCTGAT CAAGAGACAG 3660 

GATGAGGATC GTTTCGCATG ATTGAACAAG ATGGATTGCA CGCAGGTTCT CCGGCCGCTT 3720 

GGGTGGAGAG GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC TCTGATGCCG 3780 

CCGTGTTCCG GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GACCTGTCCG 3840 

GTGCCCTGAA TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC ACGACGGGCG 3900 

TTCCTTGCGC AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG CTGCTATTGG 3960 

GCGAAGTGCC GGGGCAGG AT CTCCTGTCAT CTCACCTTGC TCCTGCCGAG AAAGTATCCA 4020 

TCATGGCTGA TGCAATGCGG CGGCTGCATA CGCTTGATCC GGCTACCTGC CCATTCGACC 4080 

ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT CTTGTCGATC 4140 

AGGATGATCT GGACGAAGAG CATCAGGGGC TCGCGCCAGC CGAACTGTTC GCCAGGCTCA 4200 
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AGGCGCGCAT GCCCGACGGC 
ATATCATGGT GGAAAATGGC 
CGGACCGCTA TCAGGACATA 
AATGGGCTGA CCGCTTCCTC 
CCTTCTATCG CCTTCTTGAC 
CCAAGCGACG CCCAACCTGC 
GTTGGGCTTC GGAATCGTTT 
CATGCTGGAG TTCTTCGCCC 
CCTGAGGCTG GACGACCTCG 
AACCAGCAGC GGCTATCCGC 
GGCCGCTTTG GTCCCGGATC 
ACAAACTACC TACAGAGATT 
GTGTTAAACT ACTGATTCTA 
AATGGGAGCA GTGGTGGAAT 
CATCTAGTGA TGATGAGGCT 
GAAAGGTAGA AGACCCCAAG 
TGTTTAGTAA TAGAACTCTT 
TGCTATACAA GAAAATTATG 
ATAATCATAA CATACTGTTT 
ACTATGCTCA AAAATTGTGT 
ATTTGATGTA TAGTGCCTTG 
TTACTTGCTT TAAAAAACCT 
ATTGTTGTTG TTAACTTGTT 
ACAAATTTCA CAAATAAAGC 
ATCAATGTAT CTTATCATGT 
AACCTCCTCT ACTTGAGAGG 



GAGGATCTCG 
CGCTTTTCTG 
GCGTTGGCTA 
GTGCTTTACG 
GAGTTCTTCT 
CATCACGAGA 
TCCGGGACGC 
ACCCCGGGCT 
CGGAGTTCTA 
GCATCCATGC 
TTTGTGAAGG 
TAAAGCTCTA 
ATTGTTTGTG 
GCCTTTAATG 
ACTGCTGACT 
GACTTTCCTT 
GCTTGCTTTG 
GAAAAATATT 
TTTCTTACTC 
ACCTTTAGCT 
ACTAGAGATC 
CCCACACCTC 
TATTGCAGCT 
ATTTTTTTCA 
CTGGATCCCC 
ACATTCCAAT 



TCGTGACCCA 
GATTCATCGA 
CCCGTGATAT 
GTATCGCCGC 
GAGCGGGACT 
TTTCGATTCC 
CGGCTGGATG 
CGATCCCCTC 
CCGGCAGTGC 
CCCCGAACTG 
AACCTTACTT 
AGGTAAATAT 
TATTTTAGAT 
AGGAAAACCT 
CTCAACATTC 
CAGAATTGCT 
CTATTTACAC 
CTGTAACCTT 
CACACAGGCA 
TTTTAATTTG 
ATAATCAGCC 
CCCCTGAACC 
TATAATGGTT 
CTGCATTCTA 
AGGAAGCTCC 
CATAGGCTGC 



TGGCGATGCC 
CTGTGGCCGG 
TGCTGAAGAG 
TCCCGATTCG 
CTGGGGTTCG 
ACCGCCGCCT 
ATCCTCCAGC 
GCGAGTTGGT 
AAATCCGTCG 
CAGGAGTGGG 
CTGTGGTGTG 
AAAATTTTTA 
TCCAACCTAT 
GTTTTGCTCA 
TACTCCTCCA 
AAGTTTTTTG 
CACAAAGGAA 
TATAAGTAGG 
TAGAGTGTCT 
TAAAGGGGTT 
AT AC C AC ATT 
TGAAACATAA 
ACAAATAAAG 
GTTGTGGTTT 
TCTGTGTCCT 
CCATCCACCC 



TGCTTGCCGA 
CTGGGTGTGG 
CTTGGCGGCG 
CAGCGCATCG 
AAATGACCGA 
TCTATGAAAG 
GCGGGGATCT 
TCAGCTGCTG 
GCATCCAGGA 
GAGGCACGAT 
ACATAATTGG 
AGTGTATAAT 
GGAACTGATG 
GAAGAAATGC 
AAAAAGAAGA 
AGTCATGCTG 
AAAGCTGCAC 
CATAACAGTT 
GCTATTAATA 
AATAAGGAAT 
TGTAGAGGTT 
AATGAATGCA 
CAATAGCATC 
GTCCAAACTC 
CATAAACCCT 
TCTGTGTCCT 



4260 

4320 

4380 

4440 

4500 

4560 

4620 

4680 

4740 

4800 

4860 

4920 

4980 

5040 

5100 

5160 

5220 

5280 

5340 

5400 

5460 

5520 

5580 

5640 

5700 

5760 
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CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG GGGTTTTTCA CAGACCGCTT 5820 

TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT 5880 

GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG CTGTCAGCTT TGCACAAGGG 5940 

CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG TTAGTAATGT GCAAAACAGG 6000 

AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA GTGTTTTCAT TTTTACTTGG 6060 

ATCAGGAACC CAGCACTCCA CTGGATAAGC ATTATCCTTA TCCAAAACAG CCTTGTGGTC 6120 

AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG GTTACAGTTT GAGCAGGATA 6180 

TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 6240 

AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT 6300 

CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA CAGTAACAGC 6360 

TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420 

ACACTCCGCT ATCGCTACGT GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG 6480 

CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540 

TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600 

GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660 

ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720 

CAGCTTATAA TGGTTACAAA TAAAGCAATA G CATC AC AAA TTTCACAAAT AAAGCATTTT 6780 

TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 6840 

TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 6900 

TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 6960 

CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 7020 

CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 7080 

CAGCCAGACA AGGTTGTTGA CACAAGACCC ACATCTGGTA TAAAAGGAGG CAGTGGCCCA 7140 

CAGAGGAGCA CAGCTGTGTT TGGCTGCAGG GCCAAGAGCG CTGTCAAGAA GACCCACACG 7200 

CCCCCCTCCA GCAGCTGAAT TCCAGCTGGC ATTCCGGTAC TGTTGGTAAA ATGGAAGACG 7260 

CCAAAAACAT AAAGAAAGGC CCGGCGCCAT TCTATCCTCT AGAGGATGGA ACCGCTGGAG 7320 
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AGCAACTGCA TAAGGCTATG AAGAGATACG CCCTGGTTCC TGGAACAATT GCTTTTACAG 
ATGCACATAT CGAGGTGAAC ATCACGTACG CGGAATACTT CGAAATGTCC GTTCGGTTGG 
CAGAAGCTAT GAAACGATAT GGGCTGAATA CAAATCACAG AATCGTCGTA TGCAGTGAAA 
ACTCTCTTCA ATTCTTTATG CCGGTGTTGG GCGCGTTATT TATCGGAGTT GCAGTTGCGC 
CCGCGAACGA CATTTATAAT GAACGTGAAT TGCTCAACAG TATGAACATT TCGCAGCCTA 
CCGTAGTGTT TGTTTCCAAA AAGGGGTTGC AAAAAATTTT GAACGTGCAA AAAAAATTAC 
CAATAATCCA GAAAATTATT ATCATGGATT CTAAAACGGA TTACCAGGGA TTTCAGTCGA 
TGTACACGTT CGTCACATCT CATCTACCTC CCGGTTTTAA TGAATACGAT TTTGTACCAG 
AGTCCTTTGA TCGTGACAAA ACAATTGCAC TGATAATGAA TTCCTCTGGA TCTACTGGGT 
TACCTAAGGG TGTGGCCCTT CCGCATAGAA CTGCCTGCGT CAGATTCTCG CATGCCAGAG 
ATCCTATTTT TGGCAATCAA ATCATTCCGG ATACTGCGAT TTTAAGTGTT GTTCCATTCC 
ATCACGGTTT TGGAATGTTT ACTACACTCG GATATTTGAT ATGTGGATTT CGAGTCGTCT 
TAATGTATAG ATTTGAAGAA GAGCTGTTTT TACGATCCCT TCAGGATTAC AAAATTCAAA 
GTGCGTTGCT AGTACCAACC CTATTTTCAT TCTTCGCCAA AAGCACTCTG ATTGACAAAT 
ACGATTTATC TAATTTACAC GAAATTGCTT CTGGGGGCGC ACCTCTTTCG AAAGAAGTCG 
GGGAAGCGGT TGCAAAACGC TTCCATCTTC CAGGGATACG ACAAGGATAT GGGCTCACTG 
AGACTACATC AGCTATTCTG ATTACACCCG AGGGGGATGA TAAACCGGGC GCGGTCGGTA 
AAGTTGTTCC ATTTTTTGAA GCGAAGGTTG TGGATCTGGA TACCGGGAAA ACGCTGGGCG 
TTAATCAGAG AGGCGAATTA TGTGTCAGAG GACCTATGAT TATGTCCGGT TATGTAAACA 
ATCCGGAAGC GACCAACGCC TTGATTGACA AGGATGGATG GCTACATTCT GGAGACATAG 
CTTACTGGGA CGAAGACGAA CACTTCTTCA TAGTTGACCG CTTGAAGTCT TTAATTAAAT 
ACAAAGGATA TCAGGTGGCC CCCGCTGAAT TGGAATCGAT ATTGTTACAA CACCCCAACA 
TCTTCGACGC GGGCGTGGCA GGTCTTCCCG ACGATGACGC CGGTGAACTT CCCGCCGCCG 
TTGTTGTTTT GGAGCACGGA AAGACGATGA CGGAAAAAGA GATCGTGGAT TACGTCGCCA 
GTCAAGTAAC AACCGCGAAA AAGTTGCGCG GAGGAGTTGT GTTTGTGGAC GAAGTACCGA 
AAGGTCTTAC CGGAAAACTC GACGCAAGAA AAATCAGAGA GATCCTCATA AAGGCCAAGA 



7380 

7440 

7500 

7560 

7620 

7680 

7740 

7800 

7860 

7920 

7980 

8040 

8100 

8160 

8220 

8280 

8340 

8400 

8460 

8520 

8580 

8640 

8700 

8760 

8820 

8880 
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AGGGCGGAAA GTCCAAATTG TAAAATGTAA CTGTATTCAG CGATGACGAA ATTCTTAGCT 8940 

ATTGTAATGA CTCTAGAGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG TGACATAATT 9000 

GGACAAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT TAAGTGTATA 9060 

ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT ATGGAACTGA 9120 

TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT CAGAAGAAAT 9180 

GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC CAAAAAAGAA 9240 

GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAACTTTT T TGAGTCATGC 9300 

TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG AAAAAGCTGC 9360 

ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA GGCATAACAG 9420 

TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT CTGCTATTAA 9480 

TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG TTAATAAGGA 9540 

ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA TTTGTAGAGG 9600 

TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT AAAATGAATG 9660 

CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA AGCAATAGCA 9720 

TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT TTGTCCAAAC 9780 

TCATCAATGT ATCTTATCAT GTCTGGATCC CCAGGAAGCT CCTCTGTGTC CTCATAAACC 9840 

CTAACCTCCT CTACTTGAGA GGACATTCCA ATCATAGGCT GCCCATCCAC CCTCTGTGTC 9900 

CTCCTGTTAA TTAGGTCACT TAACAAAAAG GAAATTGGGT AGGGGTTTTT CACAGACCGC 9960 

TTTCTAAGGG TAATTTTAAA ATATCTGGGA AGTCCCTTCC ACTGCTGTGT TCCAGAAGTG 10020 

TTGGTAAACA GCCCACAAAT GTCAACAGCA GAAACATACA AGCTGTCAGC TTTGCACAAG 10080 

GGCCCAACAC CCTGCTCAGC AAGAAGCACT GTGGTTGCTG TGTTAGTAAT GTGCAAAACA 10140 

GGAGG CACAT TTTCCCCACC TGTGTAGGTT CCAAAATATC TAGTGTTTTC ATTTTTACTT 10200 

GGATCAGGAA CCCAGCACTC CACTGGATAA GCATTATCCT TATCCAAAAC AGCCTTGTGG 10260 

TCAGTGTTCA TCTGCTGACT GTCAACTGTA GCATTTTTTG GGGTTACAGT TTGAGCAGGA 10320 

TATTTGGTCC TGTAGTTTGC TAACACACCC TGCAGCTCCA AAGGTTCCCC ACCAACAGCA 10380 

AAAAAATGAA AATTTG AC CC TTGAATGGGT TTTCCAGCAC CATTTTCATG AGTTTTTTGT 10440 
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GTCCCTGAAT GCAAGTTTAA CATAGCAGTT ACCCCAATAA CCTCAGTTTT AACAGTAACA 10500 
GCTTCCCACA TCAAAATATT TCCACAGGTT AAGTCCTCAT TTAAATTAGG CAAAGGAA 10558 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6245 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 



TTCTTGAAGA 


CGAAAGGGCC 


TCGTGATACG 


CCTATTTTTA TAGGTTAATG TCATGATAAT 


60 


AATGGTTTCT 


TAGACGTCAG 


GTGGCACTTT 


TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 


120 


TTTATTTTTC 


TAAATACATT 


CAAATATGTA 


TCCGCTCATG AGACAATAAC CCTGATAAAT 


ISO 


GCTTCAATAA 


TATTGAAAAA 


GGAAGAGTAT 


GAGTATTCAA CATTTCCGTG TCGCCCTTAT 


240 


TCCCTTTTTT 


GCGGCATTTT 


GCCTTCCTGT 


TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 


300 


AAAAGATGCT 


GAAGATCAGT 


TGGGTGCACG 


AGTGGGTTAC ATCGAACTGG ATCTCAACAG 


360 


CGGTAAGATC 


CTTGAGAGTT 


TTCGCCCCGA 


AGAACGTTTT CCAATGATGA GCACTTTTAA 


420 


AGTTCTGCTA 


TGTGGCGCGG 


TATTATCCCG 


TGTTGACGCC GGGCAAGAGC AACTCGGTCG 


480 


CCGCATACAC 


TATTCTCAGA 


ATGACTTGGT 


TGAGTACTCA CCAGTCACAG AAAAGCATCT 


540 


TACGGATGGC 


ATGACAGTAA 


GAGAATTATG 


CAGTGCTGCC ATAACCATGA GTGATAACAC 


600 


TGCGGCCAAC 


TTACTTCTGA 


CAACGATCGG 


AGGACCGAAG GAGCTAACCG CTTTTTTGCA 


660 


CAACATGGGG 


GATCATGTAA 


CTCGCCTTGA 


TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 


720 


ACCAAACGAC 


GAGCGTGACA 


CCACGATGCC 


TGCAGCAATG GCAACAACGT TGCGCAAACT 


780 


ATTAACTGGC 


GAACTACTTA 


CTCTAGCTTC 


CCGGCAACAA TTAATAGACT GGATGGAGGC 


840 


GGATAAAGTT 


GCAGGACCAC 


TTCTGCGCTC 


GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 


900 
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TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA i200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100 

ATAGTTAAGC CAGTATACAC TCCGCTATCG CTACGTGACT GGGTCATGGC TGCGCCCCGA 2160 

CACCCGCCAA CACCCGCTGA CGCGCCCTGA CGGGCTTGTC TGCTCCCGGC ATCCGCTTAC 2220 

AGACAAGCTG TGACCGTCTC CGGGAGCTGC ATGTGTCAGA GGTTTTCACC GTCATCACCG 2280 

AAACGCGCGA GGCAGCGGAT CATAATCAGC CATACCACAT TTGTAGAGGT TTTACTTGCT 2340 

TTAAAAAACC TCCCACACCT CCCCCTGAAC CTGAAACATA AAATGAATGC AATTGTTGTT 2400 

GTTAACTTGT TTATTGCAGC TTATAATGGT TACAAATAAA GCAATAGCAT CACAAATTTC 2460 
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ACAAATAAAG CATTTTTTTC ACTGCATTCT AGTTGTGGTT TGTCCAAACT CATCAATGTA 2520 

TCTTATCATG TCTGGATCAT AATCAGCCAT ACCACATTTG TAGAGGTTTT ACTTGCTTTA 2580 

AAAAACCTCC CACACCTCCC CCTGAACCTG AAACATAAAA TGAATGCAAT TGTTGTTGTT 2640 

AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA ATAGCATCAC AAATTTCACA 2700 

AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT CAATGTATCT 2760 

TATCATGTCT GGATCCCAAG TTCATCTATT TCCTCCCACA TCTGGTATAA AAGGAGGCAG 2820 

TGGCCCACAG AGGAGCACAG CTGTGTTTGG CTGCAGGGCC AAGAGCGCTG TCAAGAAGAC 2880 

CCACACGCCC CCCTCCAGCA GCTGAATTCC AGCTGGCATT CCGGTACTGT TGGTAAAATG 2940 

GAAGACGCCA AAAACATAAA GAAAGGCCCG GCGCCATTCT ATCCTCTAGA GGATGGAACC 3000 

GCTGGAGAGC AACTGCATAA GGCTATGAAG AGATACGCCC TGGTTCCTGG AACAATTGCT 3060 

TTTACAGATG CACATATCGA GGTGAACATC ACGTACGCGG AATACTTCGA AATGTCCGTT 3120 

CGGTTGGCAG AAGCTATGAA ACGATATGGG CTGAATACAA ATCACAGAAT CGTCGTATGC 3180 

AGTGAAAACT CTCTTCAATT CTTTATGCCG GTGTTGGGCG CGTTATTTAT CGGAGTTGCA 3240 

GTTGCGCCCG CGAACGACAT TTATAATGAA CGTGAATTGC TCAACAGTAT GAACATTTCG 3300 

CAGCCTACCG TAGTGTTTGT TTCCAAAAAG GGGTTGCAAA AAATTTTGAA CGTGCAAAAA 3360 

AAATTACCAA TAATCCAGAA AATTATTATC ATGGATTCTA AAACGGATTA CCAGGGATTT 3420 

CAGTCGATGT ACACGTTCGT CACATCTCAT CTACCTCCCG GTTTTAATGA ATACGATTTT 3480 

GTACCAGAGT CCTTTGATCG TGACAAAACA ATTGCACTGA TAATGAATTC CTCTGGATCT 3540 

ACTGGGTTAC CTAAGGGTGT GGCCCTTCCG CATAGAACTG CCTGCGTCAG ATTCTCGCAT 3600 

GCCAGAGATC CTATTTTTGG CAATCAAATC ATTCCGGATA CTGCGATTTT AAGTGTTGTT 3660 

CCATTCCATC ACGGTTTTGG AATGTTTACT ACACTCGGAT ATTTGATATG TGGATTTCGA 3720 

GTCGTCTTAA TGTATAGATT TGAAGAAGAG CTGTTTTTAC GATCCCTTCA GGATTACAAA 3780 

ATTCAAAGTG CGTTGCTAGT ACCAACCCTA TTTTCATTCT TCGCCAAAAG CACTCTGATT 3840 

GACAAATACG ATTTATCTAA TTTACACGAA ATTGCTTCTG GGGGCGCACC TCTTTCGAAA 3900 

GAAGTCGGGG AAGCGGTTGC AAAACGCTTC CATCTTCCAG GGATACGACA AGGATATGGG 3960 

CTCACTGAGA CTACATCAGC TATTCTGATT ACACCCGAGG GGGATGATAA ACCGGGCGCG 4020 
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GTCGGTAAAG TTGTTCCATT TTTTGAAGCG AAGGTTGTGG ATCTGGATAC CGGGAAAACG 4080 

CTGGG CGTTA ATCAGAGAGG CGAATTATGT GTCAGAGGAC CTATGATTAT GTCCGGTTAT 4140 

GTAAACAATC CGGAAGCGAC CAACGCCTTG ATTGACAAGG ATGGATGGCT ACATTCTGGA 4200 

GACATAGCTT ACTGGGACGA AGACGAACAC TTCTTCATAG TTGACCGCTT GAAGTCTTTA 4260 

ATTAAATACA AAGGATATCA GGTGGCCCCC GCTGAATTGG AATCGATATT GTTACAACAC 4320 

CCCAACATCT TCGACGCGGG CGTGGCAGGT CTTCCCGACG ATGACGCCGG TGAACTTCCC 4380 

GCCGCCGTTG TTGTTTTGGA GCACGGAAAG ACGATGACGG AAAAAGAGAT CGTGGATTAC 4440 

GTCGCCAGTC AAGTAACAAC CGCGAAAAAG TTGCGCGGAG GAGTTGTGTT TGTGGACGAA 4500 

GTACCGAAAG GTCTTACCGG AAAACTCGAC GCAAGAAAAA TCAGAGAGAT CCTCATAAAG 4560 

GCCAAGAAGG GCGGAAAGTC CAAATTGTAA AATGTAACTG TATTCAGCGA TGACGAAATT 4620 

CTTAGCTATT GTAATGACTC TAGAGGATCT TTGTGAAGGA ACCTTACTTC TGTGGTGTGA 4680 

CATAATTGGA CAAACTACCT ACAGAGATTT AAAGCTCTAA GGTAAATATA AAATTTTTAA 4740 

GTGTATAATG TGTTAAACTA CTGATTCTAA TTGTTTGTGT ATTTTAGATT CCAACCTATG 4800 

GAACTGATGA ATGGGAGCAG TGGTGGAATG CCTTTAATGA GGAAAACCTG TTTTGCTCAG 4860 

AAGAAATGCC ATCTAGTGAT GATGAGGCTA CTGCTGACTC TCAACATTCT ACTCCTCCAA 4920 

AAAAGAAGAG AAAGGTAGAA GACCCCAAGG ACTTTCCTTC AGAATTGCTA AGTTTTTTGA 4980 

GTCATGCTGT GTTTAGTAAT AGAACTCTTG CTTGCTTTGC TATTTACACC ACAAAGGAAA 5040 

AAGCTGCACT GCTATACAAG AAAATTATGG AAAAATATTC TGTAACCTTT ATAAGTAGGC 5100 

ATAACAGTTA TAATCATAAC ATACTGTTTT TTCTTACTCC ACACAGGCAT AGAGTGTCTG 5160 

CTATTAATAA CTATGCTCAA AAATTGTGTA CCTTTAGCTT TTTAATTTGT AAAGGGGTTA 5220 

ATAAGGAATA TTTGATGTAT AGTGCCTTGA CTAGAGATCA TAATCAGCCA TACCACATTT 5280 

GTAGAGGTTT TACTTGCTTT AAAAAACCTC CCACACCTCC CCCTGAACCT GAAACATAAA 5340 

ATGAATGCAA TTGTTGTTGT TAACTTGTTT ATTGCAGCTT ATAATGGTTA CAAATAAAGC 5400 

AATAGCATCA CAAATTTCAC AAATAAAGCA TTTTTTTCAC TGCATTCTAG TTGTGGTTTG 5460 

TCCAAACTCA TCAATGTATC TTATCATGTC TGGATCCCCA GGAAGCTCCT CTGTGTCCTC 5520 

ATAAACCCTA ACCTCCTCTA CTTGAGAGGA CATTCCAATC ATAGGCTGCC CATCCACCCT 5580 
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CTGTGTCCTC CTGTTAATTA GGTCACTTAA CAAAAAGGAA 
AGACCGCTTT CTAAGGGTAA TTTTAAAATA TCTGGGAAGT 
AGAAGTGTTG GTAAACAGCC CACAAATGTC AACAGCAGAA 
GCACAAGGGC CCAACACCCT GCTCAGCAAG AAGCACTGTG 
CAAAACAGGA GGCACATTTT CCCCACCTGT GTAGGTTCCA 
TTTACTTGGA TCAGGAACCC AGCACTCCAC TGGATAAGCA 
CTTGTGGTCA GTGTTCATCT GCTGACTGTC AACTGTAGCA 
AGCAGGATAT TTGGTCCTGT AGTTTGCTAA CACACCCTGC 
AACAGCAAAA AAATGAAAAT TTGACCCTTG AATGGGTTTT 
TTTTTGTGTC CCTGAATGCA AGTTTAACAT AGCAGTTACC 
AGTAACAGCT TCCCACATCA AAATATTTCC ACAGGTTAAG 
AGGAA 

(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6254 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 



ATTGGGTAGG 

CCCTTCCACT 

ACATACAAGC 

GTTGCTGTGT 

AAATATCTAG 

TTATCCTTAT 

TTTTTTGGGG 

AGCTCCAAAG 

CCAGCACCAT 

CCAATAACCT 

TCCTCATTTA 



GGTTTTTCAC 

GCTGTGTTCC 

TGTCAGCTTT 

TAGTAATGTG 

TGTTTTCATT 

CCAAAACAGC 

TTACAGTTTG 

GTTCCCCACC 

TTTCATGAGT 

CAGTTTTAAC 

AATTAGGCAA 



5640 

5700 

5760 

5820 

5880 

5940 

6000 

6060 

6120 

6180 

6240 

6245 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 
AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 
TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 
GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 
TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 



60 
120 
180 
240 
300 
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AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG 


ATCTCAACAG 


360 


CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA 


GCACTTTTAA 


420 


AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC 


AACTCGGTCG 


480 


CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG 


AAAAGCATCT 


540 


TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA 


GTGATAACAC 


600 


TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG 


CTTTTTTGCA 


660 


CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA 


ATGAAGCCAT 


720 


ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT 


TGCGCAAACT 


780 


ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT 


GGATGGAGGC 


840 


GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT 


TTATTGCTGA 


900 


TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG 


GGCCAGATGG 


960 


TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA 


TGGATGAACG 


1020 


AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC 


TGTCAGACCA 


1080 


AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA 


AAAGGATCTA 


1140 


GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT 


TTTCGTTCCA 


1200 


CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT 


TTTTTCTGCG 


1260 


CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT 


GTTTGCCGGA 


1320 


TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC 


AGATACCAAA 


1380 


TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG 


TAGCACCGCC 


1440 


TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG 


ATAAGTCGTG 


1500 


TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT 


CGGGCTGAAC 


1560 


GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC 


TGAGATACCT 


1690 


ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG 


ACAGGTATCC 


1680 


GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG 


GAAACGCCTG 


1740 


GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT 


TTTTGTGATG 


1800 


CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT 


TACGGTTCCT 


1860 
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GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 
TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 
CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 
TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 
ATAGTTAAGC CAGTATACAC TCCGCTATCG CTACGTGACT GGGTCATGGC TGCGCCCCGA 
CACCCGCCAA CACCCGCTGA CGCGCCCTGA CGGGCTTGTC TGCTCCCGGC ATCCGCTTAC 2220 
AGACAAGCTG TGACCGTCTC CGGGAGCTGC ATGTGTCAGA GGTTTTCACC GTCATCACCG 
AAACGCGCGA GGCAGCGGAT CATAATCAGC CATACCACAT TTGTAGAGGT TTTACTTGCT 
TTAAAAAACC TCCCACACCT CCCCCTGAAC CTGAAACATA AAATGAATGC AATTGTTGTT 
GTTAACTTGT TTATTGCAGC TTATAATGGT TACAAATAAA GCAATAGCAT CACAAATTTC 
ACAAATAAAG CATTTTTTTC ACTGCATTCT AGTTGTGGTT TGTCCAAACT CATCAATGTA 
TCTTATCATG TCTGGATCAT AATCAGCCAT ACCACATTTG TAGAGGTTTT ACTTGCTTTA 2580 
AAAAACCTCC CACACCTCCC CCTGAACCTG AAACATAAAA TGAATGCAAT TGTTGTTGTT 
AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA ATAGCATCAC AAATTTCACA 
AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT CAATGTATCT 2760 
TATCATGTCT GGATCCCAGT GGGGAGTCAG CCGTGTATCA TCGCCCACAT CTGGTATAAA 
AGGAGGCAGT GGCCCACAGA GGAGCACAGC TGTGTTTGGC TGCAGGGCCA AGAGCGCTGT 
CAAGAAGACC CACACGCCCC CCTCCAGCAG CTGAATTCCA GCTGGCATTC CGGTACTGTT 2940 
GGTAAAATGG AAGACGCCAA AAACATAAAG AAAGGCCCGG CGCCATTCTA TCCTCTAGAG 
GATGGAACCG CTGGAGAGCA ACTGCATAAG GCTATGAAGA GATACGCCCT GGTTCCTGGA 
ACAATTGCTT TTACAGATGC ACATATCGAG GTGAACATCA CGTACGCGGA ATACTTCGAA 
ATGTCCGTTC GGTTGGCAGA AGCTATGAAA CGATATGGGC TGAATACAAA TCACAGAATC 
GTCGTATGCA GTGAAAACTC TCTTCAATTC TTTATGCCGG TGTTGGGCGC GTTATTTATC 
GGAGTTGCAG TTGCGCCCGC GAACGACATT TATAATGAAC GTGAATTGCT CAACAGTATG 
AACATTTCGC AGCCTACCGT AGTGTTTGTT TCCAAAAAGG GGTTGCAAAA AATTTTGAAC 3360 
GTGCAAAAAA AATTACCAAT AATCCAGAAA ATTATTATCA TGGATTCTAA AACGGATTAC 3420 



2040 
2100 
2160 



2280 
2340 
2400 
2460 
2520 



2640 
2700 



2820 
2880 



3000 

3060 

3120 

3180 

3240 

3300 
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CAGGGATTTC AGTCGATGTA CACGTTCGTC ACATCTCATC TACCTCCCGG TTTTAATGAA 3480 

TACGATTTTG TACCAGAGTC CTTTGATCGT GACAAAACAA TTGCACTGAT AATGAATTCC 3540 

TCTGGATCTA CTGGGTTACC TAAGGGTGTG GCCCTTCCGC ATAGAACTGC CTGCGTCAGA 3600 

TTCTCGCATG CCAGAGATCC TATTTTTGGC AATCAAATCA TTCCGGATAC TGCGATTTTA 3660 

AGTGTTGTTC CATTCCATCA CGGTTTTGGA ATGTTTACTA CACTCGGATA TTTGATATGT 3720 

GGATTTCGAG TCGTCTTAAT GTATAGATTT GAAGAAGAGC TGTTTTTACG ATCCCTTCAG 3780 

GATTACAAAA TTCAAAGTGC GTTGCTAGTA CCAACCCTAT TTTCATTCTT CGCCAAAAGC 3840 

ACTCTGATTG ACAAATACGA TTTATCTAAT TTACACGAAA TTGCTTCTGG GGGCGCACCT 3900 

CTTTCGAAAG AAGTCGGGGA AGCGGTTGCA AAACGCTTCC ATCTTCCAGG GATACGACAA 3960 

GGATATGGGC TCACTGAGAC TACATCAGCT ATTCTGATTA CAC CCGAGGG GGATGATAAA 4020 

CCGGGCGCGG TCGGTAAAGT TGTTCCATTT TTTGAAGCGA AGGTTGTGGA TCTGGATACC 4080 

GGGAAAACGC TGGGCGTTAA TCAGAGAGGC GAATTATGTG TCAGAGGACC TATGATTATG 4140 

TCCGGTTATG TAAACAATCC GGAAGCGACC AACGCCTTGA TTGACAAGGA TGGATGGCTA 4200 

CATTCTGGAG ACATAGCTTA CTGGGACGAA GACGAACACT TCTTCATAGT TGACCGCTTG 4260 

AAGTCTTTAA TTAAATACAA AGGATATCAG GTGGCCCCCG CTGAATTGGA ATCGATATTG 4320 

TTACAACACC CCAACATCTT CGACGCGGGC GTGGCAGGTC TTCCCGACGA TGACGCCGGT 4380 

GAACTTCCCG CCGCCGTTGT TGTTTTGGAG CACGGAAAGA CGATGACGGA AAAAGAGATC 4440 

GTGGATTACG TCGCCAGTCA AGTAACAACC GCGAAAAAGT TGCGCGGAGG AGTTGTGTTT 4500 

GTGGACGAAG TACCGAAAGG TCTTACCGGA AAACTCGACG CAAGAAAAAT CAGAGAGATC 4560 

CTCATAAAGG CCAAGAAGGG CGGAAAGTCC AAATTGTAAA ATGTAACTGT ATTCAGCGAT 4620 

GACGAAATTC TTAGCTATTG TAATGACTCT AGAGGATCTT TGTGAAGGAA CCTTACTTCT 4680 

GTGGTGTGAC ATAATTGGAC AAACTACCTA CAGAGATTTA AAGCTCTAAG GTAAATATAA 4740 

AATTTTTAAG TGTATAATGT GTTAAACTAC TGATTCTAAT TGTTTGTGTA TTTTAGATTC 4800 

CAACCTATGG AACTGATGAA TGGGAGCAGT GGTGG AATGC CTTTAATGAG GAAAACCTGT 4860 

TTTGCTCAGA AGAAATGCCA TCTAGTGATG ATGAGGCTAC TGCTGACTCT CAACATTCTA 4920 

CTCCTCCAAA AAAGAAGAGA AAGGTAGAAG ACCCCAAGGA CTTTCCTTCA GAATTGCTAA 4980 
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GTTTTTTGAG TCATGCTGTG TTTAGTAATA GAACTCTTGC TTGCTTTGCT ATTTACACCA 
CAAAGGAAAA AGCTGCACTG CTATACAAGA AAATTATGGA AAAATATTCT GTAACCTTTA 
TAAGTAGGCA TAACAGTTAT AAT C ATAAC A TACTGTTTTT TCTTACTCCA CACAGGCATA 
GAGTGTCTGC TATTAATAAC TATGCTCAAA AATTGTGTAC CTTTAGCTTT TTAATTTGTA 
AAGGGGTTAA TAAGGAATAT TTGATGTATA GTGCCTTGAC TAGAGATCAT AATCAGCCAT 
ACCACATTTG TAGAGGTTTT ACTTGCTTTA AAAAACCTCC CACACCTCCC CCTGAACCTG 
AAACATAAAA TGAATGCAAT TGTTGTTGTT AACTTGTTTA TTGCAGCTTA TAATGGTTAC 
AAATAAAGCA ATAGCATCAC AAATTTCACA AATAAAGCAT TTTTTTCACT GCATTCTAGT 
TGTGGTTTGT CCAAACTCAT CAATGTATCT TATCATGTCT GGATCCCCAG GAAGCTCCTC 
TGTGTCCTCA TAAACCCTAA CCTCCTCTAC TTGAGAGGAC ATTCCAATCA TAGGCTGCCC 
ATCCACCCTC TGTGTCCTCC TGTTAATTAG GTCACTTAAC AAAAAGGAAA TTGGGTAGGG 
GTTTTTCACA GA CCGCTTTC TAAGGGTAAT TTTAAAATAT CTGGGAAGTC CCTTCCACTG 
CTGTGTTCCA GAAGTGTTGG TAAACAGCCC ACAAATGTCA ACAGCAGAAA CATACAAGCT 
GTCAGCTTTG CACAAGGGCC CAACACCCTG CTCAGCAAGA AGCACTGTGG TTGCTGTGTT 
AGTAATGTGC AAAACAGGAG GCACATTTTC CCCACCTGTG TAGGTTCCAA AATATCTAGT 
GTTTTCATTT TTACTTGGAT CAGGAACCCA GCACTCCACT GGATAAGCAT TATCCTTATC 
CAAAACAGCC TTGTGGTCAG TGTTCATCTG CTGACTGTCA ACTGTAGCAT TTTTTGGGGT 
TACAGTTTGA GCAGGATATT TGGTCCTGTA GTTTGCTAAC ACACCCTGCA GCTCCAAAGG 
TTCCCCACCA ACAGCAAAAA AATGAAAATT TGACCCTTGA ATGGGTTTTC CAGCACCATT 
TTCATGAGTT TTTTGTGTCC CTGAATGCAA GTTTAACATA GCAGTTACCC CAATAACCTC 
AGTTTTAACA GTAACAGCTT CCCACATCAA AATATTTCCA CAGGTTAAGT CCTCATTTAA 
ATTAGGCAAA GGAA 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6265 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: circular 
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(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGG CATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 
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CGTAATCTGC 
TCAAGAGCTA 
TACTGTCCTT 
TACATACCTC 
TCTTACCGGG 
GGGGGGTTCG 
ACAGCGTGAG 
GGTAAGCGGC 
GTATCTTTAT 
CTCGTCAGGG 
GGCCTTTTGC 
TAACCGTATT 
CAGCGAGTCA 
TCTGTGCGGT 
ATAGTTAAGC 
CACCCGCCAA 
AGACAAGCTG 
AAACGCGCGA 
TTAAAAAACC 
GTTAACTTGT 
ACAAATAAAG 
TCTTATCATG 
AAAAACCTCC 
AACTTGTTTA 
AATAAAGCAT 
TATCATGTCT 



TGCTTGCAAA 
CCAACTCTTT 
CTAGTGTAGC 
GCTCTGCTAA 
TTGGACTCAA 
TGCACACAGC 
CATTGAGAAA 
AGGGTCGGAA 
AGTCCTGTCG 
GGGCGGAGCC 
TGGCCTTTTG 
ACCGCCTTTG 
GTGAGCGAGG 
ATTTCACACC 
CAGTATACAC 
CACCCGCTGA 
TGACCGTCTC 
GGCAGCGGAT 
TCCCACACCT 
TTATTGCAGC 
CATTTTTTTC 
TCTGGATCAT 
CACACCTCCC 
TTGCAGCTTA 
TTTTTTCACT 
GGATCCCACT 



CAAAAAAACC 
TTCCGAAGGT 
CGTAGTTAGG 
TCCTGTTACC 
GACGATAGTT 
CCAGCTTGGA 
GCGCCACGCT 
CAGGAGAGCG 
GGTTTCGCCA 
TATGGAAAAA 
CTCACATGTT 
AGTGAGCTGA 
AAGCGGAAGA 
GCATATGGTG 
TCCGCTATCG 
CGCGCCCTGA 
CGGGAGCTGC 
CATAATCAGC 
CCCCCTGAAC 
TTATAATGGT 
ACTGCATTCT 
AATCAGCCAT 
CCTGAACCTG 
TAATGGTTAC 
GCATTCTAGT 
CCAACCTCAG 



ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

AACTGGCTTC AGCAGAGCGC AG ATA CC AAA 1380 

CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

ACCGGATAAG GCGCAGCGGT CGGGCTG AAC 1560 

GCGAACGACC TACACCGAAC TGAGATACCT 1620 

TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CGCCAGCAAC GCGGCCTTTT TACGGTTCCT I860 

CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 

GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 

CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100 

CTACGTGACT GGGTCATGGC TGCGCCCCGA 2160 
CGGGCTTGTC TGCTCCCGGC ATCCGCTTAC 2220 
ATGTGTCAGA GGTTTTCACC GTCATCACCG 2280 
CATACCACAT TTGTAGAGGT TTTACTTGCT 2340 
CTGAAACATA AAATGAATGC AATTGTTGTT 2400 
TACAAATAAA GCAATAGCAT CACAAATTTC 2460 
AGTTGTGGTT TGTCCAAACT CATCAATGTA 2520 
ACCACATTTG TAGAGGTTTT ACTTGCTTTA 2580 
AAACATAAAA TGAATGCAAT TGTTGTTGTT 2640 
AAATAAAGCA ATAGCATCAC AAATTTCACA 2700 
TGTGGTTTGT CCAAACTCAT CAATGTATCT 2760 
CCAGACAAGG TTGTTGACAC AAGACCCACA 2820 
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TCTGGTATAA AAGGAGGCAG TGGCCCACAG AGGAGCACAG CTGTGTTTGG CTGCAGGGCC 2880 

AAGAGCGCTG TCAAGAAGAC CCACACGCCC CCCTCCAGCA GCTGAATTCC AGCTGGCATT 2940 

CCGGTACTGT TGGTAAAATG GAAGACGCCA AAAACATAAA GAAAGGCCCG GCGCCATTCT 3000 

ATCCTCTAGA GGATGGAACC GCTGGAGAGC AACTGCATAA GGCTATGAAG AGATACGCCC 3060 

TGGTTCCTGG AACAATTGCT TTTACAGATG CACATATCGA GGTGAACATC ACGTACGCGG 3120 

AATACTTCGA AATGTCCGTT CGGTTGGCAG AAGCTATGAA ACGATATGGG CTGAATACAA 3180 

ATCACAGAAT CGTCGTATGC AGTGAAAACT CTCTTCAATT CTTTATGCCG GTGTTGGGCG 3240 

CGTTATTTAT CGGAGTTGCA GTTGCGCCCG CGAACGACAT TTATAATGAA CGTGAATTGC 3300 

TCAACAGTAT GAACATTTCG CAGCCTACCG TAGTGTTTGT TTCCAAAAAG GGGTTGCAAA 3360 

AAATTTTGAA CGTGCAAAAA AAATTACCAA TAATCCAGAA AATTATTATC ATGGATTCTA 3420 

AAACGGATTA CCAGGGATTT CAGTCGATGT ACACGTTCGT CACATCTCAT CTACCTCCCG 3480 

GTTTTAATGA ATACGATTTT GTACCAGAGT CCTTTGATCG TGACAAAACA ATTGCACTGA 3540 

TAATGAATTC CTCTGGATCT ACTGGGTTAC CTAAGGGTGT GGCCCTTCCG CATAGAACTG 3600 

CCTGCGTCAG ATTCTCGCAT GCCAGAGATC CTATTTTTGG CAATCAAATC ATTCCGGATA 3660 

CTGCGATTTT AAGTGTTGTT CCATTCCATC ACGGTTTTGG AATGTTTACT ACACTCGGAT 3720 

ATTTGATATG TGGATTTCGA GTCGTCTTAA TGTATAGATT TGAAGAAGAG CTGTTTTTAC 3780 

GATCCCTTCA GGATTACAAA ATTCAAAGTG CGTTGCTAGT ACCAACCCTA TTTTCATTCT 3840 

TCGCCAAAAG CACTCTGATT GACAAATACG ATTTATCTAA TTTACACGAA ATTGCTTCTG 3900 

GGGGCGCACC TCTTTCGAAA GAAGTCGGGG AAGCGGTTGC AAAACGCTTC CATCTTCCAG 3960 

GGATACGACA AGGATATGGG CTCACTGAGA CTACATCAGC TATTCTGATT ACACCCGAGG 4020 

GGGATGATAA ACCGGGCGCG GTCGGTAAAG TTGTTCCATT TTTTGAAGCG AAGGTTGTGG 4080 

ATCTGGATAC CGGGAAAACG CTGGGCGTTA ATCAGAGAGG CGAATTATGT GTCAGAGGAC 4140 

CTATGATTAT GTCCGGTTAT GTAAACAATC CGGAAGCGAC CAACGCCTTG ATTGACAAGG 4200 

ATGGATGGCT ACATTCTGGA GACATAGCTT ACTGGGACGA AGACGAACAC TTCTTCATAG 4260 

TTGACCGCTT GAAGTCTTTA ATTAAATACA AAGGATATCA GGTGGCCCCC GCTGAATTGG 4320 

AATCGATATT GTTACAACAC CCCAACATCT TCGACGCGGG CGTGGCAGGT CTTCCCGACG 4380 
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ATGACGCCGG TGAACTTCCC GCCGCCGTTG TTGTTTTGGA GCACGGAAAG ACGATGACGG 
AAAAAGAGAT CGTGGATTAC GTCGCCAGTC AAGTAACAAC CGCGAAAAAG TTGCGCGGAG 
GAGTTGTGTT TGTGGACGAA GTACCGAAAG GTCTTACCGG AAAACTCGAC GCAAGAAAAA 
TCAGAGAGAT CCTCATAAAG GCCAAGAAGG GCGGAAAGTC CAAATTGTAA AATGTAACTG 
TATTCAGCGA TGACGAAATT CTTAGCTATT GTAATGACTC TAGAGGATCT TTGTGAAGGA 
ACCTTACTTC TGTGGTGTGA CATAATTGGA CAAACTACCT ACAGAGATTT AAAGCTCTAA 
GGTAAATATA AAATTTTTAA GTGTATAATG TGTTAAACTA CTGATTCTAA TTGTTTGTGT 
ATTTTAGATT CCAACCTATG GAACTGATGA ATGGGAGCAG TGGTGGAATG CCTTTAATGA 
GGAAAACCTG TTTTGCTCAG AAGAAATGCC ATCTAGTGAT GATGAGGCTA CTGCTGACTC 
TCAACATTCT ACTCCTCCAA AAAAGAAGAG AAAGGTAGAA GACCCCAAGG ACTTTCCTTC 
AGAATTGCTA AGTTTTTTGA GTCATGCTGT GTTTAGTAAT AGAACTCTTG CTTGCTTTGC 
TATTTACACC ACAAAGGAAA AAGCTGCACT GCTATACAAG AAAATTATGG AAAAATATTC 
TGTAACCTTT ATAAGTAGGC ATAACAGTTA TAATCATAAC ATACTGTTTT TTCTTACTCC 
ACACAGGCAT AGAGTGTCTG CTATTAATAA CTATGCTCAA AAATTGTGTA CCTTTAGCTT 
TTTAATTTGT AAAGGGGTTA ATAAGGAATA TTTGATGTAT AGTGCCTTGA CTAGAGATCA 
TAATCAGCCA TACCACATTT GTAGAGGTTT TACTTGCTTT AAAAAACCTC CCACACCTCC 
CCCTGAACCT GAAACATAAA ATGAATGCAA TTGTTGTTGT TAACTTGTTT ATTGCAGCTT 
ATAATGGTTA CAAATAAAGC AATAGCATCA CAAATTTCAC AAATAAAGCA TTTTTTTCAC 
TGCATTCTAG TTGTGGTTTG TCCAAACTCA TCAATGTATC TTATCATGTC TGGATCCCCA 
GGAAGCTCCT CTGTGTCCTC ATAAACCCTA ACCTCCTCTA CTTGAGAGGA CATTCCAATC 
ATAGGCTGCC CATCCACCCT CTGTGTCCTC CTGTTAATTA GGTCACTTAA CAAAAAGGAA 
ATTGGGTAGG GGTTTTTCAC AGACCGCTTT CTAAGGGTAA TTTTAAAATA TCTGGGAAGT 
CCCTTCCACT GCTGTGTTCC AGAAGTGTTG GTAAACAGCC CACAAATGTC AACAGCAGAA 
ACATACAAGC TGTCAGCTTT GCACAAGGGC CCAACACCCT GCTCAGCAAG AAGCACTGTG 
GTTGCTGTGT TAGTAATGTG CAAAACAGGA GGCACATTTT CCCCACCTGT GTAGGTTCCA 
AAATATCTAG TGTTTTCATT TTTACTTGGA TCAGGAACCC AGCACTCCAC TGGATAAGCA 
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TTATCCTTAT CCAAAACAGC CTTGTGGTCA GTGTTCATCT GCTGACTGTC AACTGTAGCA 6000 

TTTTTTG GGG TTACAGTTTG AGCAGGATAT TTGGTCCTGT AGTTTGCTAA CACACCCTGC 6060 

AGCTCCAAAG GTTCCCCACC AACAGCAAAA AAATGAAAAT TTGACCCTTG AATGGGTTTT 6120 

CCAGCACCAT TTTCATGAGT TTTTTGTGTC CCTGAATGCA AGTTTAACAT AGCAGTTACC 6180 

CCAATAACCT CAGTTTTAAC AGTAACAGCT TCCCACATCA AAATATTTCC ACAGGTTAAG 6240 

TCCTCATTTA AATTAGGCAA AGGAA 6265 
(2) INFORMATION FOR SEQ ID NO : 10 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6254 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 
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CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 
ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 
ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 
GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 
TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 
GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 
TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 
CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 
TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100 
ATAGTTAAGC CAGTATACAC TCCGCTATCG CTACGTGACT GGGTCATGGC TGCGCCCCGA 2160 
CACCCGCCAA CACCCGCTGA CGCGCCCTGA CGGGCTTGTC TGCTCCCGGC ATCCGCTTAC 2220 
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AGACAAGCTG TGACCGTCTC CGGGAGCTGC ATGTGTCAGA GGTTTTCACC GTCATCACCG 2280 

AAACGCGCGA GGCAGCGGAT CATAATCAGC CATACCACAT TTGTAGAGGT TTTACTTGCT 2340 

TTAAAAAACC TCCCACACCT CCCCCTGAAC CTGAAACATA AAATGAATGC AATTGTTGTT 2400 

GTTAACTTGT TTATTGCAGC TTATAATGGT TACAAATAAA GCAATAGCAT CACAAATTTC 2460 

ACAAATAAAG CATTTTTTTC ACTGCATTCT AGTTGTGGTT TGTCCAAACT CATCAATGTA 2520 

TCTTATCATG TCTGGATCAT AATCAGCCAT ACCACATTTG TAGAGGTTTT ACTTGCTTTA 2580 

AAAAACCTCC CACACCTCCC CCTGAACCTG AAACATAAAA TGAATGCAAT TGTTGTTGTT 2640 

AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA ATAGCATCAC AAATTTCACA 2700 

AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT CAATGTATCT 2760 

TATCATGTCT GGATCCCAGC CAGACAAGGT TGTTGACACA AGACCCACAT CTGGTATAAA 2820 

AGGAGGCAGT GGCCCACAGA GGAGCACAGC TGTGTTTGGC TGCAGGGCCA AGAGCGCTGT 2880 

CAAGAAGACC CACACGCCCC CCTCCAGCAG CTGAATTCCA GCTGGCATTC CGGTACTGTT 2940 

GGTAAAATGG AAGACGCCAA AAACATAAAG AAAGGCCCGG CGCCATTCTA TCCTCTAGAG 3000 

GATGGAACCG CTGGAGAGCA ACTGCATAAG GCTATGAAGA GATACGCCCT GGTTCCTGGA 3060 

ACAATTGCTT TTACAGATGC ACATATCGAG GTGAACATCA CGTACGCGGA ATACTTCGAA 3120 

ATGTCCGTTC GGTTGGCAGA AGCTATGAAA CGATATGGGC TGAATACAAA TCACAGAATC 3180 

GTCGTATGCA GTGAAAACTC TCTTCAATTC TTTATGCCGG TGTTGGGCGC GTTATTTATC 3240 

GGAGTTGCAG TTGCGCCCGC GAACGACATT TATAATGAAC GTGAATTGCT CAACAGTATG 3300 

AACATTTCGC AGCCTACCGT AGTGTTTGTT TCCAAAAAGG GGTTGCAAAA AATTTTGAAC 3360 

GTGCAAAAAA AATTACCAAT AATCCAGAAA ATTATTATCA TGGATTCTAA AACGGATTAC 3420 

CAGGGATTTC AGTCGATGTA CACGTTCGTC ACATCTCATC TACCTCCCGG TTTTAATGAA 3480 

TACGATTTTG TACCAGAGTC CTTTGATCGT GACAAAACAA TTGCACTGAT AATGAATTCC 3540 

TCTGGATCTA CTGGGTTACC TAAGGGTGTG GCCCTTCCGC ATAGAACTGC CTGCGTCAGA 3600 

TTCTCGCATG CCAGAGATCC TATTTTTGGC AATCAAATCA TTCCGGATAC TGCGATTTTA 3660 

AGTGTTGTTC CATTCCATCA CGGTTTTGGA ATGTTTACTA CACTCGGATA TTTGATATGT 3720 

GGATTTCGAG TCGTCTTAAT GTATAGATTT GAAGAAGAGC TGTTTTTACG ATCCCTTCAG 3780 
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GATTACAAAA TTCAAAGTGC GTTGCTAGTA CCAACCCTAT TTTCATTCTT CGCCAAAAGC 3840 

ACTCTGATTG ACAAATACGA TTTATCTAAT TTACACGAAA TTGCTTCTGG GGGCGCACCT 3900 

CTTTCGAAAG AAGTCGGGGA AGCGGTTGCA AAACGCTTCC ATCTTCCAGG GATACGACAA 3960 

GGATATGGGC TCACTGAGAC TACATCAGCT ATTCTGATTA CACCCGAGGG GGATGATAAA 4020 

CCGGGCGCGG TCGGTAAAGT TGTTCCATTT TTTGAAGCGA AGGTTGTGGA TCTGGATACC 4080 

GGG AAAACGC TGGGCGTTAA TCAGAGAGGC GAATTATGTG TCAGAGGACC TATGATTATG 4140 

TCCGGTTATG TAAACAATCC GGAAGCGACC AACGCCTTGA TTGACAAGGA TGGATGGCTA 4200 

CATTCTGGAG ACATAGCTTA CTGGGACGAA GACGAACACT TCTTCATAGT TGACCGCTTG 4260 

AAGTCTTTAA TTAAATACAA AGGATATCAG GTGGCCCCCG CTGAATTGGA ATCGATATTG 4320 

TTACAACACC CCAACATCTT CGACGCGGGC GTGGCAGGTC TTCCCGACGA TGACGCCGGT 4380 

GAACTTCCCG CCGCCGTTGT TGTTTTGGAG CACGGAAAGA CGATGACGGA AAAAGAGATC 4440 

GTGGATTACG TCGCCAGTCA AGTAACAACC GCGAAAAAGT TGCGCGGAGG AGTTGTGTTT 4500 

GTGGACGAAG TACCGAAAGG TCTTACCGGA AAACTCGACG CAAGAAAAAT C AG AG AG AT C 4560 

CTCATAAAGG CCAAGAAGGG CGGAAAGTCC AAATTGTAAA ATGTAACTGT ATTCAGCGAT 4620 

GACGAAATTC TTAGCTATTG TAATGACTCT AGAGGATCTT TGTGAAGGAA CCTTACTTCT 4680 

GTGGTGTGAC ATAATTGGAC AAACTACCTA CAGAGATTTA AAGCTCTAAG GTAAATATAA 4740 

AATTTTTAAG TGTATAATGT GTTAAACTAC TGATTCTAAT TGTTTGTGTA TTTTAGATTC 4800 

CAACCTATGG AACTGATGAA TGGGAGCAGT GGTGGAATGC CTTTAATGAG GAAAACCTGT 4860 

TTTGCTCAGA AGAAATGCCA TCTAGTGATG ATGAGGCTAC TGCTGACTCT CAACATTCTA 4920 

CTCCTCCAAA AAAGAAGAGA AAGGTAGAAG ACCCCAAGGA CTTTCCTTCA GAATTGCTAA 4980 

GTTTTTTGAG TCATGCTGTG TTTAGTAATA GAACTCTTGC TTGCTTTGCT ATTTACACCA 5040 

CAAAGGAAAA AGCTGCACTG CTATACAAGA AAATTATGGA AAAATATTCT GTAACCTTTA 5100 

TAAGTAGGCA TAACAGTTAT AATCATAACA TACTGTTTTT TCTTACTCCA CACAGGCATA 5160 

GAGTGTCTGC TATTAATAAC TATGCTCAAA AATTGTGTAC CTTTAGCTTT TTAATTTGTA 5220 

AAGGGGTTAA TAAGGAATAT TTGATGTATA GTGCCTTGAC TAGAGATCAT AATCAGCCAT 5280 

ACCACATTTG TAGAGGTTTT ACTTGCTTTA AAAAACCTCC CACACCTCCC CCTGAACCTG 5340 
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AAACATAAAA TGAATGCAAT TGTTGTTGTT AACTTGTTTA TTGCAGCTTA TAATGGTTAC 5400 

AAATAAAGCA ATAGCATCAC AAATTTCACA AATAAAGCAT TTTTTTCACT GCATTCTAGT 5460 

TGTGGTTTGT CCAAACTCAT CAATGTATCT TATCATGTCT GGATCCCCAG GAAGCTCCTC 5520 

TGTGTCCTCA TAAACCCTAA CCTCCTCTAC TTGAGAGGAC ATTCCAATCA TAGGCTGCCC 5580 

ATCCACCCTC TGTGTCCTCC TGTTAATTAG GTCACTTAAC AAAAAGGAAA TTGGGTAGGG 5640 

GTTTTTCACA GACCGCTTTC TAAGGGTAAT TTTAAAATAT CTGGGAAGTC CCTTCCACTG 5700 

CTGTGTTCCA GAAGTGTTGG TAAACAGCCC ACAAATGTCA A C AG C AG AAA CATACAAGCT 5760 

GTCAGCTTTG CACAAGGGCC CAACACCCTG CTCAGCAAGA AGCACTGTGG TTGCTGTGTT 5820 

AGTAATGTGC AAAACAGGAG GCACATTTTC CCCACCTGTG TAGGTTCCAA AATATCTAGT 5880 

GTTTTCATTT TTACTTGGAT CAGGAACCCA GCACTCCACT GGATAAGCAT TATCCTTATC 5940 

CAAAACAGCC TTGTGGTCAG TGTTCATCTG CTGACTGTCA ACTGTAGCAT TTTTTGGGGT 6000 

TACAGTTTGA GCAGGATATT TGGTCCTGTA GTTTGCTAAC ACACCCTGCA GCTCCAAAGG 6060 

TTCCCCACCA ACAGCAAAAA AATGAAAATT TGACCCTTGA ATGGGTTTTC CAGCACCATT 6120 

TTCATGAGTT TTTTGTGTCC CTGAATGCAA GTTTAACATA GCAGTTACCC CAATAACCTC 6180 

AGTTTTAACA GTAACAGCTT CCCACATCAA AATATTTCCA CAGGTTAAGT CCTCATTTAA 6240 

ATTAGGCAAA GGAA 6254 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1442 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
GGTACCCAGG CTGCATAACC AGGAGGTGAG TGGCAGGTGA GTGAAATTTC ATCTGTAGTT 60 
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TG 

(2) INFORMATION FOR SEQ ID NO: 12: 
(i) SEQUENCE CHARACTERISTICS 



120 
180 
240 
300 



420 
480 



ACAGCCACTC CTCATCACTC GCATTACCAC CAGAGCTCCA CTCCCTGTCA GATCAGCGGC 
GGCATTAGAT TCTCATAGGA GCTCGAACCC TATTCTAAAC TGTTCATGTG AGGGATCTAG 
GTTGCAAGCT CCCTATGAGA ATCTAATGCC TGATGATCTG TCACGGTCTC CCATCACCCC 
TAGATGGGAC CATCTAGTTG CAGGAAAACA AGCTCAGGCT CCCACTGATT CTACACGATG 
GTGAATTGTG GAATTATTTC ATTATATATA TTACAATGTA ATAATAATAG AAATAAAGCA 360 
CACAATAAAT GTAATGTGCT TGAATCATCC CGAAACCATC CCACCCTGGT CTGTGAAAAA 
ATTGTCTTCC ATGAAACCAG TCCCTGGTGC CAAAAACGTT GAGGACCACT GCTCCACAGA 
ATCTATCGGT CACTCTTCCT CCCCTCACCC CCTTGCCCTA AAAGCACACC CTGCAAACCT 540 
GCCATGAATT GACACTCTGT TTCTATCCCT TTTCCCCTTG TGTCTGTGTC TGGAGGAAGA 600 
GGATAAAGGA CAAGCTGCCC CAAGTCCTAG CGGGCAGCTC GAGGAAGTGA AACTTACACG 660 
TTGGTCTCCT GTTTCCTTAC CAAGCTTACC ATGGTAACCC CTGGTCCCGT TCAGCCACCA 720 
CCACCCCACC CAGCACACCT CCAACCTCAG CCAGACAAGG TTGTTGACAC AAGAGAGCCC 780 
TCAGGGGCAC AGAGAGAGTC TGGACACGTG GGGAGTCAGC CGTGTATCAT CGGAGGCGGC 
CGGGCACATG GCAGGGATGA GGGAAAGACC AAGAGTCCTC TGTTGGGCCC AAGTCCTAGA 
CAGACAAAAC CTAGACAATC ACGTGGCTGG CTGCATGCCT GTGGCTGTTG GGCTGGGCAG 
GAGGAGGGAG GGGCGCTCTT TCCTGGAGGT GGTCCAGAGC ACCGGGTGGA CAGCCCTGGG 1020 
GGAAAACTTC CACGTTTTGA TGGAGGTTAT CTTTGATAAC TCCACAGTGA CCTGGTTCGC 
CAAAGGAAAA GCAGGCAACG TGAGCTGTTT TTTTTTTCTC CAAGCTGAAC ACTAGGGGTC 
CTAGGCTTTT TGGGTCACCC GGCATGGCAG ACAGTCAACC TGGCAGGACA TCCGGGAGAG 1200 
ACAGACACAG GCAGAGGGCA GAAAGGTCAA GGGAGGTTCT CAGGCCAAGG CTATTGGGGT 1260 
TTGCTCAATT GTTCCTGAAT GCTCTTACAC ACGTACACAC ACAGAGCAGC ACACACACAC 1320 
ACACACACAT GCCTCAGCAA GTCCCAGAGA GGGAGGTGTC GAGGGGGACC CGCTGGCTGT 1380 
TCAGACGGAC TCCCAGAGCC AGTGAGTGGG TGGGGCTGGA ACATGAGTTC ATCTATTTCC 1440 



840 
900 
960 



1080 
1140 



1442 
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(A) LENGTH: 761 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 



AAG CTTACCA 


TGGTAACCCC 


TGGTCCCGTT 


CAGCCACCAC 


CACCCCACCC 


AGCACACCTC 


60 


CAACCTCAGC 


CAGACAAGGT 


TGTTGACACA 


AGAGAGCCCT 


CAGGGGCACA 


GAGAGAGTCT 


120 


GGACACGTGG 


GGAGTCAGCC 


GTGTATCATC 


GGAGGCGGCC 


GGGCACATGG 


CAGGGATGAG 


180 


GGAAAGACCA 


AGAGTCCTCT 


GTTGGGCCCA 


AGTCCTAGAC 


AGACAAAACC 


TAGACAATCA 


240 


CGTGGCTGGC 


TGCATGCCTG 


TGGCTGTTGG 


GCTGGGCAGG 


AGGAGGGAGG 


GGCGCTCTTT 


300 


CCTGGAGGTG 


GTCCAGAGCA 


CCGGGTGGAC 


AGCCCTGGGG 


GAAAACTTCC 


ACGTTTTGAT 


360 


GGAGGTTATC 


TTTGATAACT 


CCACAGTGAC 


CTGGTTCGCC AAAGGAAAAG 


CAGGCAACGT 


420 


GAGCTGTTTT 


TTTTTTCTCC 


AAGCTGAACA 


CTAGGGGTCC TAGGCTTTTT GGGTCACCCG 


480 


GCATGGCAGA 


CAGTCAACCT 


GGCAGGACAT 


CCGGGAGAGA 


CAGACACAGG 


CAGAGGGCAG 


540 


AAAGGTCAAG 


GGAGGTTCTC 


AGGCCAAGGC 


TATTGGGGTT 


TGCTCAATTG 


TTCCTGAATG 


600 


CTCTTACACA 


CGTACACACA 


CAGAGCAGCA 


CACACACACA 


CACACACATG 


CCTCAGCAAG 


660 


TCCCAGAGAG 


GGAGGTGTCG 


AGGGGGACCC 


GCTGGCTGTT 


CAGACGGACT 


CCCAGAGCCA 


720 


GTGAGTGGGT 


GGGGCTGGAA 


CATGAGTTCA 


TCTATTTCCT 


G 




761 



(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 165 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



WO 95/19987 



PCT7US95/01153 



-153- 



(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
AAGCTTACCA TGGTAACCCC TGGTCCCGTT CAGCCACCAC CACCCCACCC AGCACACCTC 60 
CAACCTCAGC CAGACAAGGT TGTTGACACA AGAGAGCCCT CAGGGGCACA GAGAGAGTCT 120 
GGACACGTGG GGAGTCAGCC GTGTATCATC GGAGGCGGCC GGGCA 165 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
AGTTCATCTA TTTCCT 16 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
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GTGGGGAGTC AGCCGTGTAT CATCG 25 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 36 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
CTCCAACCTC AGCCAGACAA GGTTGTTGAC ACAAGA 36 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
GCCAGACAAG GTTGTTGACA CAAGA 25 
(2) INFORMATION FOR SEQ ID NO: 18: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 115 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



WO 95/19987 



-155- 



PCT/US95/01153 



(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
CCCACATCTG GTATAAAAGG AGGCAGTGGC CCACAGAGGA GCACAGCTGT GTTTGGCTGC 60 
AGGGCCAAGA GCGCTGTCAA GAAGACCCAC ACGCCCCCCT CCAGCAGCTG AATTC 115 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 345 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii). MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
GGCCAGACGC CAACAAGGTA GGAGCTGGAG CATTCGGGCT GGGTTTCACC CCACCGCACG 60 
GAGGCCTTTT GGGGTGGAGC CCTCAGGCTC AGGGCATACT ACAAACTTTG CCAGCAAATC 120 
CGCCTCCTGC CTCCACCAAT CGCCAGTCAG GAAGGCAGCC TACCCCGCTG TCTCCACCTT 180 
TGAGAAACAC TCATCCTCAG GCCATGCAGT GGAATTCCAC AACCTTCCAC CAAACTCTGC 240 
AAGATCCCAG AGTGAGAGGC CTGTATTTCC CTGCTGGTGG CTCCAGTTCA GGAACAG^TAA 300 
ACCCTGTTCT GACTACTGCC TCTCCCTTAT CGTCAATCTT CTCGA 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4302 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



345 
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(iii) HYPOTHETICAL: NO 
(iv) ANT I- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 

TCGACCTCGA GGGATCTTTG TGAAGGAACC TTACTTCTGT GGTGTGACAT AATTGGACAA 60 

ACTACCTACA GAGATTTAAA GCTCTAAGGT AAATATAAAA TTTTTAAGTG TATAATGTGT 120 

TAAACTACTG ATTCTAATTG TTTGTGTATT TTAGATTCCA ACCTATGGAA CTGATGAATG 180 

GGAGCAGTGG TGGAATGCCT TTAATGAGGA AAACCTGTTT TGCTCAGAAG AAATGCCATC 240 

TAGTGATGAT GAGGCTACTG CTGACTCTCA ACATTCTACT CCTCCAAAAA AGAAGAGAAA 300 

GGTAGAAGAC CCCAAGGACT TTCCTTCAGA ATTGCTAAGT TTTTTGAGTC ATGCTGTGTT 360 

TAGTAATAGA ACTCTTGCTT GCTTTGCTAT TTACACCACA AAGGAAAAAG CTGCACTGCT 420 

ATACAAGAAA ATTATGGAAA AATATTCTGT AACCTTTATA AGTAGGCATA ACAGTTATAA 480 

TCATAACATA CTGTTTTTTC TTACTCCACA CAGGCATAGA GTGTCTGCTA TTAATAACTA 540 

TGCTCAAAAA TTGTGTACCT TTAGCTTTTT AATTTGTAAA GGGGTTAATA AGGAATATTT 600 

GATGTATAGT GCCTTGACTA GAGATCATAA TCAGCCATAC CACATTTGTA GAGGTTTTAC 660 

TTGCTTTAAA AAACCTCCCA CACCTCCCCC TGAACCTGAA ACATAAAATG AATGCAATTG 720 

TTGTTGTTAA CTTGTTTATT GCAGCTTATA ATGGTTACAA ATAAAGCAAT AGCATCACAA 780 

ATTTCACAAA TAAAGCATTT TTTTCACTGC ATTCTAGTTG TGGTTTGTCC AAACTCATCA 840 

ATGTATCTTA TCATGTCTGG ATCCGGCTGT GGAATGTGTG TCAGTTAGGG TGTGGAAAGT 900 

CCCCAGGCTC CCCAGCAGGC AGAAGTATGC AAAGCATGCA TCTCAATTAG TCAGCAACCA 960 

GGTGTGGAAA GTCCCCAGGC TCCCCAGCAG GCAGAAGTAT GCAAAGCATG CATCTCAATT 1020 

AGTCAGCAAC CATAGTCCCG CCCCTAACTC CGCCCATCCC GCCCCTAACT CCGCCCAGTT 1080 

CCGCCCATTC TCCGCCCCAT GGCTGACTAA TTTTTTTTAT TTATGCAGAG GCCGAGGCCG 1140 

CCTCGGCCTC TGAGCTATTC CAGAAGTAGT GAGGAGGCTT TTTTGGAGGC CTAGGCTTTT 1200 

GCAAAAAGCT TCACGCTGCC GCAAGCACTC AGGGCGCAAG GGCTGCTAAA GGAAGCGGAA 1260 

CACGTAGAAA GCCAGTCCGC AGAAACGGTG CTGACCCCGG ATGAATGTCA GCTACTGGGC 1320 
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TATCTGGACA 
ATGGCGATAG 
GGCGCCCTCT 
AAGGATCTGA 
CATGATTGAA 
CGGCTATGAC 
AGCGCAGGGG 
GCAGGACGAG 
GCTCGACGTT 
GGATCTCCTG 
GCGGCGGCTG 
CATCGAGCGA 
AG AG CATC AG 
CGGCGAGGAT 
TGGCCGCTTT 
CATAGCGTTG 
CCTCGTGCTT 
TGACGAGTTC 
CTGCCATCAC 
GTTTTCCGGG 
GCCCACCCCG 
CTCGCGGAGT 
CCGCGCATCC 
GATCTTTGTG 
GATTTAAAGC 
TCTAATTGTT 



AGGGAAAACG 
CTAGACTGGG 
GGTAAGGTTG 
TGGCGCAGGG 
CAAGATGGAT 
TGGGCACAAC 
CGCCCGGTTC 
GCAGCGCGGC 
GTCACTGAAG 
TCATCTCACC 
CATACGCTTG 
GCACGTACTC 
GGGCTCGCGC 
CTCGTCGTGA 
TCTGGATTCA 
GCTACCCGTG 
TACGGTATCG 
TTCTGAGCGG 
GAGATTTCGA 
ACGCCGGCTG 
GGCTCGATCC 
TCTACCGGCA 
ATGCCCCCGA 
AAGGAACCTT 
TCTAAGGTAA 
TGTGTATTTT 



CAAGCGCAAA 
CGGTTTTATG 
GGAAGCCCTG 
GATCAAGATC 
TGCACGCAGG 
AGACAATCGG 
TTTTTGTCAA 
TATCGTGGCT 
CGGGAAGGGA 
TTGCTCCTGC 
ATCCGGCTAC 
GGATGGAAGC 
CAGCCGAACT 
CCCATGGCGA 
TCGACTGTGG 
ATATTGCTGA 
CCGCTCCCGA 
GACTCTGGGG 
TTCCACCGCC 
GATGATCCTC 
CCTCGCGAGT 
GTGCAAATCC 
ACTGCAGGAG 
ACTTCTGTGG 
ATATAAAATT 
AGATTCCAAC 



GAGAAAGCAG 
GACAGCAAGC 
CAAAGTAAAC 
TGATCAAGAG 
TTCTCCGGCC 
CTGCTCTGAT 
GACCGACCTG 
GGCCACGACG 
CTGGCTGCTA 
CGAGAAAGTA 
CTGCCCATTC 
CGGTCTTGTC 
GTTCGCCAGG 
TGCCTGCTTG 
CCGGCTGGGT 
AGAGCTTGGC 
TTCGCAGCGC 
TTCGAAATGA 
GCCTTCTATG 
CAGCGCGGGG 
TGGTTCAGCT 
GTCGGCATCC 
TGGGGAGGCA 
TGTGACATAA 
TTTAAGTGTA 
CTATGGAACT 



GTAGCTTGCA GTGGG CTTAC 1380 

GAACCGGAAT TGCCAGCTGG 1440 

TGGATGGCTT TCTTGCCGCC 1500 

ACAGGATGAG GATCGTTTCG 1560 

GCTTGGGTGG AGAGGCTATT 1620 

GCCGCCGTGT TCCGGCTGTC 1680 

TCCGGTGCCC TGAATGAACT 1740 

GGCGTTCCTT GCGCAGCTGT 1800 

TTGGGCGAAG TGCCGGGGCA I860 

TCCATCATGG CTGATGCAAT 1920 

GACCACCAAG CGAAACATCG 1980 

GATCAGGATG ATCTGGACGA 2040 

CTCAAGGCGC GCATGCCCGA 2100 

CCGAATATCA TGGTGGAAAA 2160 

GTGGCGGACC GCTATCAGGA 2220 

GGCGAATGGG CTGACCGCTT 2280 

ATCGCCTTCT ATCGCCTTCT 2340 

CCGACCAAGC GACGCCCAAC 2400 

AAAGGTTGGG CTTCGGAATC 2460 

ATCTCATGCT GGAGTTCTTC 2520 

GCTGCCTGAG GCTGGACGAC 2580 

AGGAAACCAG CAGCGGCTAT 2640 
CGATGGCCGC TTTGGTCCCG 2700 
TTGGACAAAC TACCTACAGA 2760 
TAATGTGTTA AACTACTGAT 2820 
GATGAATGGG AGCAGTGGTG 2880 
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GAATGCCTTT AATGAGGAAA ACCTGTTTTG CTCAGAAGAA ATGCCATCTA GTGATGATGA 2940 

GGCTACTGCT GACTCTCAAC ATTCTACTCC TCCAAAAAAG AAGAGAAAGG TAGAAGACCC 3000 

CAAGGACTTT CCTTCAGAAT TGCTAAGTTT TTTGAGTCAT GCTGTGTTTA GTAATAGAAC 3060 

TCTTGCTTGC TTTGCTATTT ACACCACAAA GGAAAAAGCT GCACTGCTAT ACAAGAAAAT 3120 

TATGGAAAAA TATTCTGTAA CCTTTATAAG TAGGCATAAC AGTTATAATC ATAACATACT 3180 

GTTTTTTCTT ACTCCACACA GGCATAGAGT GTCTGCTATT AATAACTATG CTCAAAAATT 3240 

GTGTACCTTT AGCTTTTTAA TTTGTAAAGG GGTTAATAAG GAATATTTGA TGTATAGTGC 3300 

CTTGACTAGA GATCATAATC AGCCATACCA CATTTGTAGA GGTTTTACTT GCTTTAAAAA 3360 

ACCTCCCACA CCTCCCCCTG AACCTGAAAC ATAAAATGAA TGCAATTGTT GTTGTTAACT 3420 

TGTTTATTGC AGCTTATAAT GGTTACAAAT AAAGCAATAG CATCACAAAT TTCACAAATA 3480 

AAGCATTTTT TTCACTGCAT TCTAGTTGTG GTTTGTCCAA ACTCATCAAT GTATCTTATC 3540 

ATGTCTGGAT CCCCAGGAAG CTCCTCTGTG TCCTCATAAA CCCTAACCTC CTCTACTTGA 3600 

GAGGACATTC CAATCATAGG CTGCCCATCC ACCCTCTGTG TCCTCCTGTT AATTAGGTCA 3660 

CTTAACAAAA AGGAAATTGG GTAGGGGTTT TTCACAGACC GCTTTCTAAG GGTAATTTTA 3720 

AAATATCTGG GAAGTCCCTT CCACTGCTGT GTTCCAGAAG TGTTGGTAAA CAGCCCACAA 3780 

ATGTCAACAG CAGAAACATA CAAGCTGTCA GCTTTGCACA AGGGCCCAAC ACCCTGCTCA 3840 

TCAAGAAGCA CTGTGGTTGC TGTGTTAGTA ATGTGCAAAA CAGGAGGCAC ATTTTCCCCA 3900 

CCTGTGTAGG TTCCAAAATA TCTAGTGTTT TCATTTTTAC TTGGATCAGG AACCCAGCAC 3960 

TCCACTGGAT AAGCATTATC CTTATCCAAA ACAGCCTTGT GGTCAGTGTT CATCTGCTGA 4020 

CTGTCAACTG TAGCATTTTT TGGGGTTACA GTTTGAGCAG GATATTTGGT CCTGTAGTTT 4080 

GCTAACACAC CCTGCAGCTC CAAAGGTTCC CCACCAACAG CAAAAAAATG AAAATTTGAC 4140 

CCTTGAATGG GTTTTCCAGC ACCATTTTCA TGAGTTTTTT GTGTCCCTGA ATGCAAGTTT 4200 

AACATAGCAG TTACCCCAAT AACCTCAGTT TTAACAGTAA CAGCTTCCCA CATCAAAATA 4260 

TTTCCACAGG TTAAGTCCTC ATTTAAATTA GGCAAAGGAA TT 4302 
(2) INFORMATION FOR SEQ ID NO: 21: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 6170 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANT1- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT "540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 
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AGTTTACTCA TATATACTTT AGATTGATTT 
GGTGAAGATC CTTTTTGATA ATCTCATGAC 
CTGAGCGTCA GACCCCGTAG AAAAGATCAA 
CGTAATCTGC TGCTTGCAAA CAAAAAAACC 
TCAAGAGCTA CCAACTCTTT TTCCGAAGGT 
TACTGTCCTT CTAGTGTAGC CGTAGTTAGG 
TACATACCTC GCTCTGCTAA TCCTGTTACC 
TCTTACCGGG TTGGACTCAA GACGATAGTT 
GGGGGGTTCG TGCACACAGC CCAGCTTGGA 
ACAGCGTGAG CATTGAGAAA GCG CCACGCT 
GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG 
GTATCTTTAT AGTCCTGTCG GGTTTCGCCA 
CTCGTCAGGG GGGCGGAGCC TATGGAAAAA 
GGCCTTTTGC TGGCCTTTTG CTCACATGTT 
TAACCGTATT ACCGCCTTTG AGTGAGCTGA 
CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA 
TCTGTGCGGT ATTTCACACC GCATATGGTG 
ATAGTTAAGC CAGTATACAC TCCGCTATCG 
CACCCGCCAA CACCCGCTGA CGCGCCCTGA 
AGACAAGCTG TGACCGTCTC CGGGAGCTGC 
AAACGCGCGA GGCAGCGG AT CATAATCAGC 
TTAAAAAACC TCCCACACCT CCCCCTGAAC 
GTTAACTTGT TTATTGCAGC TTATAATGGT 
ACAAATAAAG CATTTTTTTC ACTGCATTCT 
TCTTATCATG TCTGGATCAT AATCAGCCAT 
AAAAACCTCC CACACCTCCC CCTGAACCTG 



• -160- 

AAAACTTCAT TTTTAATTTA 
CAAAATCCCT TAACGTGAGT 
AGGATCTTCT TGAGATCCTT 
ACCGCTACCA GCGGTGGTTT 
AACTGGCTTC AGCAGAGCGC 
CCACCACTTC AAGAACTCTG 
AGTGGCTGCT GCCAGTGGCG 
ACCGGATAAG GCGCAGCGGT 
GCGAACGACC TACACCGAAC 
TCCCGAAGGG AGAAAGGCGG 
CACGAGGGAG CTTCCAGGGG 
CCTCTGACTT GAGCGTCGAT 
CGCCAGCAAC GCGGCCTTTT 
CTTTCCTGCG TTATCCCCTG 
TACCGCTCGC CGCAGCCGAA 
GCGCCTGATG CGGTATTTTC 
CACTCTCAGT ACAATCTGCT 
CTACGTGACT GGGTCATGGC 
CGGGCTTGTC TGCTCCCGGC 
ATGTGTCAGA GGTTTTCACC 
CATACCACAT TTGTAGAGGT 
CTGAAACATA AAATGAATGC 
TACAAATAAA GCAATAGCAT 
AGTTGTGGTT TGTCCAAACT 
ACCACATTTG TAGAGGTTTT 
AAACATAAAA TGAATGCAAT 



AAAGGATCTA 1140 

TTTCGTTCCA 1200 

TTTTTCTGCG 1260 

GTTTGCCGGA 1320 

AGATACCAAA 1380 

TAGCACCGCC 1440 

ATAAGTCGTG 1500 

CGGGCTGAAC 1560 

TGAGATACCT 1620 

ACAGGTATCC 1680 

GAAACGCCTG 1740 

TTTTGTGATG 1800 

TACGGTTCCT 1860 

ATTCTGTGGA 1920 

CGACCGAGCG 1980 

TCCTTACGCA 2040 

CTGATGCCGC 2100 

TGCGCCCCGA 2160 

ATCCGCTTAC 2220 

GTCATCACCG 2280 

TTTACTTGCT 2340 

AATTGTTGTT 2400 

CACAAATTTC 2460 

CATCAATGTA 2520 

ACTTGCTTTA 2580 

TGTTGTTGTT 2640 
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AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA ATAGCATCAC AAATTTCACA 2700 

AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT CAATGTATCT 2760 

TATCATGTCT GGATCCCAAG CTTGCATGCC TGCAGGTCGA CTCTAGAGGA TCCCCGGGTA 2820 

CCGAGCTCGA ATTCCAGCTG GCATTCCGGT ACTGTTGGTA AAATGGAAGA CGCCAAAAAC 2880 

ATAAAGAAAG GCCCGGCGCC ATTCTATCCT CTAGAGGATG GAACCGCTGG AGAGCAACTG 2940 

CATAAGGCTA TGAAGAGATA CGCCCTGGTT CCTGGAACAA TTGCTTTTAC AGATGCACAT 3000 

ATCGAGGTGA ACATCACGTA CGCGGAATAC TTCGAAATGT CCGTTCGGTT GGCAGAAGCT 3060* 

ATGAAACGAT ATGGGCTGAA TACAAATCAC AGAATCGTCG TATGCAGTGA AAACTCTCTT 3120 

CAATTCTTTA TGCCGGTGTT GGGCGCGTTA TTTATCGGAG TTGCAGTTGC GCCCGCGAAC 3180 

GACATTTATA ATGAACGTGA ATTGCTCAAC AGTATGAACA TTTCGCAGCC TACCGTAGTG 3240 

TTTGTTTCCA AAAAGGGGTT GCAAAAAATT TTGAACGTGC AAAAAAAATT ACCAATAATC 3300 

CAGAAAATTA TTATCATGGA TTCTAAAACG GATTACCAGG GATTTCAGTC GATGTACACG 3360 

TTCGTCACAT CTCATCTACC TCCCGGTTTT AATGAATACG ATTTTGTACC AGAGTCCTTT 3420 

GATCGTGACA AAACAATTGC ACTGATAATG AATTCCTCTG GATCTACTGG GTTACCTAAG 3480 

GGTGTGGCCC TTCCGCATAG AACTGCCTGC GTCAGATTCT CGCATGCCAG AGATCCTATT 3540 

TTTGGCAATC AAATCATTCC GGATACTGCG ATTTTAAGTG TTGTTCCATT CCATCACGGT 3600 

TTTGGAATGT TTACTACACT CGGATATTTG ATATGTGGAT TTCGAGTCGT CTTAATGTAT 3660 

AGATTTGAAG AAGAGCTGTT TTTACGATCC CTTCAGGATT ACAAAATTCA AAGTGCGTTG 3720 

CTAGTACCAA CCCTATTTTC ATTCTTCGCC AAAAGCACTC TGATTGACAA ATACGATTTA 3780 

TCTAATTTAC ACGAAATTGC TTCTGGGGGC GCACCTCTTT CGAAAGAAGT CGGGGAAGCG 3840 

GTTGCAAAAC GCTTCCATCT TCCAGGGATA CGACAAGGAT ATGGGCTCAC TGAGACTACA 3900 

TCAGCTATTC TGATTACACC CGAGGGGGAT GATAAACCGG GCGCGGTCGG TAAAGTTGTT 3960 

CCATTTTTTG AAGCGAAGGT TGTGGATCTG GATACCGGGA AAACGCTGGG CGTTAATCAG 4020 

AGAGGCGAAT TATGTGTCAG AGGACCTATG ATTATGTCCG GTTATGTAAA CAATCCGGAA 4080 

GCGACCAACG CCTTGATTGA CAAGGATGGA TGGCTACATT CTGGAGACAT AGCTTACTGG 4140 

GACGAAGACG AACACTTCTT CATAGTTGAC CGCTTGAAGT CTTTAATTAA ATACAAAGGA 4200 
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TATCAGGTGG CCCCCGCTGA ATTGGAATCG ATATTGTTAC AACACCCCAA CATCTTCGAC 4260 
GCGGGCGTGG CAGGTCTTCC CGACGATGAC GCCGGTGAAC TTCCCGCCGC CGTTGTTGTT 4320 
TTGGAGCACG GAAAGACGAT GACGGAAAAA GAGATCGTGG ATTACGTCGC CAGTCAAGTA 4380 

ACAACCGCGA AAAAGTTGCG CGGAGGAGTT GTGTTTGTGG ACGAAGTACC GAAAGGTCTT 4440 

ACCGGAAAAC TCGACGCAAG AAAAATCAGA GAGATCCTCA TAAAGGCCAA GAAGGGCGGA 4500 

AAGTCCAAAT TGTAAAATGT AACTGTATTC AGCGATGACG AAATTCTTAG CTATTGTAAT 4560 

GACTCTAGAG GATCTTTGTG AAGGAACCTT ACTTCTGTGG TGTGACATAA TTGGACAAAC 4620 

TACCTACAGA GATTTAAAGC TCTAAGGTAA ATATAAAATT TTTAAGTGTA TAATGTGTTA 4680 

AACTACTGAT TCTAATTGTT TGTGTATTTT AGATTCCAAC CTATGGAACT GATGAATGGG 4740 

AGCAGTGGTG GAATGCCTTT AATGAGGAAA ACCTGTTTTG CTCAGAAGAA ATGCCATCTA 4800 

GTGATGATGA GGCTACTGCT GACTCTCAAC ATTCTACTCC TCCAAAAAAG AAGAGAAAGG 4860 

TAGAAGACCC CAAGGACTTT CCTTCAGAAT TGCTAAGTTT TTTGAGTCAT GCTGTGTTTA 4920 

GTAATAGAAC TCTTGCTTGC TTTGCTATTT ACACCACAAA GGAAAAAGCT GCACTGCTAT 4980 

ACAAGAAAAT TATGGAAAAA TATTCTGTAA CCTTTATAAG TAGGCATAAC AGTTATAATC 5040 

ATAACATACT GTTTTTTCTT ACTCCACACA GGCATAGAGT GTCTGCTATT AATAACTATG 5100 

CTCAAAAATT GTGTACCTTT AGCTTTTTAA TTTGTAAAGG GGTTAATAAG GAATATTTGA 5160 

TGTATAGTGC CTTGACTAGA GATCATAATC AGCCATACCA CATTTGTAGA GGTTTTACTT 5220 

GCTTTAAAAA ACCTCCCACA CCTCCCCCTG AACCTGAAAC ATAAAATGAA TGCAATTGTT 5280 

GTTGTTAACT TGTTTATTGC AGCTTATAAT GGTTACAAAT AAAGCAATAG CATCACAAAT 5340 

TTCACAAATA AAGCATTTTT TTCACTGCAT TCTAGTTGTG GTTTGTCCAA ACTCATCAAT 5400 

GTATCTTATC ATGTCTGGAT CCCCAGGAAG CTCCTCTGTG TCCTCATAAA CCCTAACCTC 5460 

CTCTACTTGA GAGGACATTC CAATCATAGG CTGCCCATCC ACCCTCTGTG TCCTCCTGTT 5520 

AATTAGGTCA CTTAACAAAA AGGAAATTGG GTAGGGGTTT TTCACAGACC GCTTTCTAAG 5580 

GGTAATTTTA AAATATCTGG GAAGTCCCTT CCACTGCTGT GTTCCAGAAG TGTTGGTAAA 5640 

CAGCCCACAA ATGTCAACAG CAGAAACATA CAAGCTGTCA GCTTTGCACA AGGGCCCAAC 5700 

ACCCTGCTCA GCAAGAAGCA CTGTGGTTGC TGTGTTAGTA ATGTGCAAAA CAGGAGGCAC 5760 
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ATTTTCCCCA CCTGTGTAGG TTCCAAAATA TCTAGTGTTT TCATTTTTAC TTGGATCAGG 5820 

AACCCAGCAC TCCACTGGAT AAGCATTATC CTTATCCAAA ACAGCCTTGT GGTCAGTGTT 5880 

CATCTGCTGA CTGTCAACTG TAGCATTTTT TGGGGTTACA GTTTGAGCAG GATATTTGGT 5940 

CCTGTAGTTT GCTAACACAC CCTGCAGCTC CAAAGGTTCC CCACCAACAG CAAAAAAATG 6000 

AAAATTTGAC CCTTGAATGG GTTTTCCAGC ACCATTTTCA TGAGTTTTTT GTGTCCCTGA 6060 

ATGCAAGTTT AACATAGCAG TTACCCCAAT AACCTCAGTT TTAACAGTAA CAGCTTCCCA 6120 

CATCAAAATA TTTCCACAGG TTAAGTCCTC ATTTAAATTA GGCAAAGGAA 6170 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10533 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND EDNESS : double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
<iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 
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TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AG ATA CC AAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 

TCTGTGCGGT ATTTCACACC GC AT ATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100 

ATAGTTAAGC CAGTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 2160 
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TGACATAATT 
TAAGTGTATA 
ATGGAACTGA 
CAGAAGAAAT 
CAAAAAAGAA 
TGAGTCATGC 
AAAAAGCTGC 
GGCATAACAG 
CTGCTATTAA 
TTAATAAGGA 
TTTGTAGAGG 
AAAATGAATG 
AGCAATAGCA 
TTGTCCAAAC 
TTAGGGTGTG 
AATTAGTCAG 
AGCATGCATC 
CTAACTCCGC 
GCAGAGGCCG 
GGAGGCCTAG 
GCTAAAGGAA 
ATGTCAGCTA 
CTTGCAGTGG 
CGGAATTGCC 
TGGCTTTCTT 
GATGAGGATC 



GGACAAACTA 
ATGTGTTAAA 
TGAATGGGAG 
GCCATCTAGT 
GAGAAAGGTA 
TGTGTTTAGT 
ACTGCTATAC 
TTATAATCAT 
TAACTATGCT 
ATATTTGATG 
TTTTACTTGC 
CAATTGTTGT 
TCACAAATTT 
TCATCAATGT 
GAAAGTCCCC 
CAACCAGGTG 
TCAATTAGTC 
CCAGTTCCGC 
AGGCCGCCTC 
GCTTTTGCAA 
GCGGAACACG 
CTGGGCTATC 
GCTTACATGG 
AGCTGGGGCG 
GCCGCCAAGG 
GTTTCGCATG 



■ -165- 

CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT 2220 

CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT 2280 

CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT 2340 

GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC 2400 

GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT 2460 

AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 2520 

AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 2580 

AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT 2640 

CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAG GGG 2700 

TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 2760 

TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 2820 

TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA 2880 

CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT 2940 

ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 3000 

AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 3060 

TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG AAGTATGCAA 3120 

AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC 3180 

CCATTCTCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 3240 

GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGGCTTTTTT 3300 

AAAGCTTCAC GCTGCCGCAA GCACTCAGGG CGCAAGGGCT 3360 

TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA CCCCGGATGA 3420 

TGGACAAGGG AAAACGCAAG CGCAAAGAGA AAGCAGGTAG 3480 

CGATAGCTAG ACTGGGCGGT TTTATGGACA GCAAGCGAAC 3540 

CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA GTAAACTGGA 3600 

ATCTGATGGC G C AG GGG AT C AAGATCTGAT CAAGAGACAG 3660 

ATTGAACAAG ATGGATTGCA CGCAGGTTCT CCGGCCGCTT 3720 
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GGGTGGAGAG GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC TCTGATGCCG 3780 

CCGTGTTCCG GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GACCTGTCCG 3840 

GTGCCCTGAA TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC ACGACGGGCG 3900 

TTCCTTGCGC AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG CTGCTATTGG 3960 

GCGAAGTGCC GGGGCAGGAT CTCCTGTCAT CTCACCTTGC TCCTGCCGAG AAAGTATCCA 4020 

TCATGGCTGA TGCAATGCGG CGGCTGCATA CGCTTGATCC GGCTACCTGC CCATTCGACC 4080 

ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT CTTGTCGATC 4140 

AGGATGATCT GGACGAAGAG CAT CAGGGGC TCGCGCCAGC CGAACTGTTC GCCAGGCTCA 4200 

AGGCGCG CAT GCCCGACGGC GAGGATCTCG TCGTGACCCA TGGCGATGCC TGCTTGCCGA 4260 

ATATCATGGT GGAAAATGGC CGCTTTTCTG GATTCATCGA CTGTGGCCGG CTGGGTGTGG 4320 

CGGACCGCTA TCAGGACATA GCGTTGGCTA CCCGTGATAT TGCTGAAGAG CTTGGCGGCG 4380 

AATGGGCTGA CCGCTTCCTC GTGCTTTACG GTATCGCCGC TCCCGATTCG CAGCGCATCG 4440 

CCTTCTATCG CCTTCTTGAC GAGTTCTTCT GAGCGGGACT CTGGGGTTCG AAATGACCGA 4500 

CCAAGCGACG CCCAACCTGC CATCACGAGA TTTCGATTCC ACCGCCGCCT TCTATGAAAG 4560 

GTTGGGCTTC GGAATCGTTT TCCGGGACGC CGGCTGGATG ATCCTCCAGC GCGGGGATCT 4620 

CATGCTGGAG TTCTTCGCCC ACCCCGGGCT CGATCCCCTC GCGAGTTGGT TCAGCTGCTG 4680 

CCTGAGGCTG GACGACCTCG CGGAGTTCTA CCGGCAGTGC AAATCCGTCG GCATCCAGGA 4740 

AACCAGCAGC GGCTATCCGC GCATCCATGC CCCCGAACTG CAGGAGTGGG GAGGCACGAT 4800 

GGCCGCTTTG GTCCCGGATC TTTGTGAAGG AACCTTACTT CTGTGGTGTG ACATAATTGG 4860 

ACAAACTACC TACAGAGATT TAAAGCTCTA AGGTAAATAT AAAATTTTTA AGTGTATAAT 4920 

GTGTTAAACT ACTGATTCTA ATTGTTTGTG TATTTTAGAT TCCAACCTAT GGAACTGATG 4980 

AATGGGAGCA GTGGTGGAAT GCCTTTAATG AGGAAAACCT GTTTTGCTCA GAAGAAATGC 5040 

CATCTAGTGA TGATGAGGCT ACTGCTGACT CTCAACATTC TACTCCTCCA AAAAAGAAGA 5100 

GAAAGGTAGA AGACCCCAAG GACTTTCCTT CAGAATTGCT AAGTTTTTTG AGTCATGCTG 5160 

TGTTTAGTAA TAGAACTCTT GCTTGCTTTG CTATTTACAC CACAAAGGAA AAAGCTGCAC 5220 

TGCTATACAA GAAAATTATG GAAAAATATT CTGTAACCTT TATAAGTAGG CATAACAGTT 5280 
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ATAATCATAA CATACTGTTT TTTCTTACTC CACACAGGCA TAGAGTGTCT GCTATTAATA 5340 

ACTATGCTCA AAAATTGTGT ACCTTTAGCT TTTTAATTTG TAAAGGGGTT AATAAGGAAT 5400 

ATTTGATGTA TAGTGCCTTG ACTAGAGATC ATAATCAGCC ATACCACATT TGTAGAGGTT 5460 

TTACTTGCTT TAAAAAACCT CCCACACCTC CCCCTGAACC TGAAACATAA AATGAATGCA 5520 

ATTGTTGTTG TTAACTTGTT TATTGCAGCT TATAATGGTT ACAAATAAAG CAATAGCATC 5580 

ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA GTTGTGGTTT GTCCAAACTC 5640 

ATCAATGTAT CTTATCATGT CTGGATCCCC AGGAAGCTCC TCTGTGTCCT CATAAACCCT 5700 

AACCTCCTCT ACTTGAGAGG ACATTCCAAT CATAGGCTGC CCATCCACCC TCTGTGTCCT 5760 

CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG GGGTTTTTCA CAGACCGCTT 5820 

TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC TGCTGTGTTC CAGAAGTGTT 5880 

GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG CTGTCAGCTT TGCACAAGGG 5940 

CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG TTAGTAATGT GCAAAACAGG 6000 

AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA GTGTTTTCAT TTTTACTTGG 6060 

ATCAGGAACC CAGCACTCCA CTGGATAAGC ATTATCCTTA TCCAAAACAG CCTTGTGGTC 6120 

AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG GTTACAGTTT GAGCAGGATA 6180 

TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 6240 

AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT 6300 

CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA CAGTAACAGC 6360 

TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420 

ACACTCCGCT ATCGCTACGT GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG 6480 

CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540 

TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600 

GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660 

ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720 

CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT 6780 

TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 6840 
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TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 6900 

TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 6960 

CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 7020 

CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 7080 

CACCCACATC TGGTATAAAA GGAGGCAGTG GCCCACAGAG GAGCACAGCT GTGTTTGGCT 7140 

GCAGGGCCAA GAGCGCTGTC AAGAAGACCC ACACGCCCCC CTCCAGCAGC TGAATTCCAG 7200 

CTGGCATTCC GGTACTGTTG GTAAAATGGA AGACGCCAAA 4ACATAAAGA AAGGCCCGGC 7260 

GCCATTCTAT CCTCTAGAGG ATGGAACCGC TGGAGAGCAA CTGCATAAGG CTATGAAGAG 7320 

ATACGCCCTG GTTCCTGGAA CAATTGCTTT TACAGATGCA CATATCGAGG TGAACATCAC 7380 

GTACGCGGAA TACTTCGAAA TGTCCGTTCG GTTGGCAGAA GCTATGAAAC GATATGGGCT 7440 

GAATACAAAT CACAGAATCG TCGTATGCAG TGAAAACTCT CTTCAATTCT TTATGCCGGT 7500 

GTTGGGCGCG TTATTTATCG GAGTTGCAGT TGCGCCCGCG AACGACATTT ATAATGAACG 7560 

TGAATTGCTC AACAGTATGA ACATTTCGCA GCCTACCGTA GTGTTTGTTT CCAAAAAGGG 7620 

GTTGCAAAAA ATTTTGAACG TGCAAAAAAA ATTACCAATA AT CC AG AAAA TTATTATCAT 7680 

GGATTCTAAA ACGGATTACC AGGGATTTCA GTCGATGTAC ACGTTCGTCA CATCTCATCT 7740 

ACCTCCCGGT TTTAATGAAT ACGATTTTGT ACCAGAGTCC TTTGATCGTG ACAAAACAAT 7800 

TGCACTGATA ATGAATTCCT CTGGATCTAC TGGGTTACCT AAGGGTGTGG CCCTTCCGCA 7860 

TAGAACTGCC TGCGTCAGAT TCTCGCATGC CAGAGATCCT ATTTTTGGCA ATCAAATCAT 7920 

TCCGGATACT GCGATTTTAA GTGTTGTTCC ATTCCATCAC GGTTTTGGAA TGTTTACTAC 7980 

ACTCGGATAT TTGATATGTG GATTTCGAGT CGTCTTAATG TATAGATTTG AAGAAGAGCT 8040 

GTTTTTACGA TCCCTTCAGG ATTACAAAAT TCAAAGTGCG TTGCTAGTAC CAACCCTATT 8100 

TTCATTCTTC GCCAAAAGCA CTCTGATTGA CAAATACGAT TTATCTAATT TACACGAAAT 8160 

TGCTTCTGGG GGCGCACCTC TTTCGAAAGA AGTCGGGGAA GCGGTTGCAA AACGCTTCCA 8220 

TCTTCCAGGG ATACGACAAG GATATGGGCT CACTGAGACT ACATCAGCTA TTCTGATTAC 8280 

ACCCGAGGGG GATGATAAAC CGGGCGCGGT CGGTAAAGTT GTTCCATTTT TTGAAGCGAA 8340 

GGTTGTGGAT CTGGATACCG GGAAAACGCT GGGCGTTAAT CAGAGAGGCG AATTATGTGT 8400 
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CAGAGGACCT ATGATTATGT CCGGTTATGT AAACAATCCG GAAGCGACCA ACGCCTTGAT 
TGACAAGGAT GGATGGCTAC ATTCTGGAGA CATAGCTTAC TGGGACGAAG ACGAACACTT 
CTTCATAGTT GACCGCTTGA AGTCTTTAAT TAAATACAAA GGATATCAGG TGGCCCCCGC 
TGAATTGGAA TCGATATTGT TACAACACCC CAACATCTTC GACGCGGGCG TGGCAGGTCT 
TCCCGACGAT GACGCCGGTG AACTTCCCGC CGCCGTTGTT GTTTTGGAGC ACGGAAAGAC 
GATGACGGAA AAAGAGATCG TGGATTACGT CGCCAGTCAA GTAACAACCG CGAAAAAGTT 
GCGCGGAGGA GTTGTGTTTG TGGACGAAGT ACCGAAAGGT CTTACCGGAA AACTCGACGC 
AAGAAAAATC AGAGAGATCC TCATAAAGGC CAAGAAGGGC GGAAAGTCCA AATTGTAAAA 
TGTAACTGTA TTCAGCGATG ACGAAATTCT TAGCTATTGT AATGACTCTA GAGGATCTTT 
GTGAAGGAAC CTTACTTCTG TGGTGTGACA TAATTGGACA AACTACCTAC AGAGATTTAA 
AGCTCTAAGG TAAATATAAA ATTTTTAAGT GTATAATGTG TTAAACTACT GATTCTAATT 
GTTTGTGTAT TTTAGATTCC AACCTATGGA ACTGATGAAT GGGAGCAGTG GTGGAATGCC 
TTTAATGAGG AAAACCTGTT TTGCTCAGAA GAAATGCCAT CTAGTGATGA TGAGGCTACT 
GCTGACTCTC AACATTCTAC TCCTCCAAAA AAGAAGAGAA AGGTAGAAGA CCCCAAGGAC 
TTTCCTTCAG AATTGCTAAG TTTTTTGAGT CATGCTGTGT TTAGTAATAG AACTCTTGCT 
TGCTTTGCTA TTTACACCAC AAAGGAAAAA GCTGCACTGC TATACAAGAA AATTATGGAA 
AAATATTCTG TAACCTTTAT AAGTAGGCAT AACAGTTATA ATCATAACAT ACTGTTTTTT 
CTTACTCCAC ACAGGCATAG AGTGTCTGCT ATTAATAACT ATGCTCAAAA ATTGTGTACC 
TTTAGCTTTT TAATTTGTAA AGGGGTTAAT AAGGAATATT TGATGTATAG TGCCTTGACT 
AGAGATCATA ATCAGCCATA CCACATTTGT AGAGGTTTTA CTTGCTTTAA AAAACCTCCC 
ACACCTCCCC CTGAACCTGA AACATAAAAT GAATGCAATT GTTGTTGTTA ACTTGTTTAT 
TGCAGCTTAT AATGGTTACA AATAAAGCAA TAGCATCACA AATTTCACAA ATAAAGCATT 
TTTTTCACTG CATTCTAGTT GTGGTTTGTC CAAACTCATC AATGTATCTT ATCATGTCTG 
GATCCCCAGG AAGCTCCTCT GTGTCCTCAT AAACCCTAAC CTCCTCTACT TGAGAGGACA 
TTCCAATCAT AGGCTGCCCA TCCACCCTCT GTGTCCTCCT GTTAATTAGG TCACTTAACA 
AAAAGGAAAT TGGGTAGGGG TTTTTCACAG ACCGCTTTCT AAGGGTAATT TTAAAATATC 
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-170- 

TGGGAAGTCC CTTCCACTGC TGTGTTCCAG AAGTGTTGGT AAACAGCCCA CAAATGTCAA 10020 

CAGCAGAAAC ATACAAGCTG TCAGCTTTGC ACAAGGGCCC AACACCCTGC TCAGCAAGAA 10080 

GCACTGTGGT TGCTGTGTTA GTAATGTGCA AAACAGGAGG CACATTTTCC CCACCTGTGT 10140 

AGGTTGCAAA ATATCTAGTG TTTTCATTTT TACTTGGATC AGGAACCCAG CACTCCACTG 10200 

GATAAGCATT ATCCTTATCC AAAACAGCCT TGTGGTCAGT GTTCATCTGC TGACTGTCAA 10260 

CTGTAGCATT TTTTGGGGTT ACAGTTTGAG CAGGATATTT GGTCCTGTAG TTTGCTAACA 10320 

CACCCTGCAG CTCCAAAGGT TCCCCACCAA CAGCAAAAAA ATGAAAATTT GACCCTTGAA 10380 

TGGGTTTTCC AGCACCATTT TCATGAGTTT TTTGTGTCCC TGAATGCAAG TTTAACATAG 10440 

CAGTTACCCC AATAACCTCA GTTTTAACAG TAACAGCTTC CCACATCAAA ATATTTCCAC 10500 

AGGTTAAGTC CTCATTTAAA TTAGGCAAAG GAA 10533 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6229 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 
AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 
TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 
GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 
TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 
AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 
CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 
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AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTT'TCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 
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CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CG GTATTTTC TCCTTACGCA 2040 

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100 

ATAGTTAAGC CAGTATACAC TCCGCTATCG CTACGTGACT GGGTCATGGC TGCGCCCCGA 2160 

CACCCGCCAA CACCCGCTGA CGCGCCCTGA CGGGCTTGTC TGCTCCCGGC ATCCGCTTAC 2220 

AGACAAGCTG TGACCGTCTC CGGGAGCTGC ATGTGTCAGA GGTTTTCACC GTCATCACCG 2280 

AAACGCGCGA GGCAGCGGAT CATAATCAGC CATACCACAT TTGTAGAGGT TTTACTTGCT 2340 

TTAAAAAACC TCCCACACCT CCCCCTGAAC CTGAAACATA AAATGAATGC AATTGTTGTT 2400 

GTTAACTTGT TTATTGCAGC TTATAATGGT TACAAATAAA GCAATAGCAT CACAAATTTC 2460 

ACAAATAAAG CATTTTTTTC ACTGCATTCT AGTTGTGGTT TGTCCAAACT CATCAATGTA 2520 

TCTTATCATG TCTGGATCAT AATCAGCCAT ACCACATTTG TAGAGGTTTT ACTTGCTTTA 2580 

AAAAACCTCC CACACCTCCC CCTGAACCTG AAACATAAAA TGAATGCAAT TGTTGTTGTT 2640 

AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA ATAGCATCAC AAATTTCACA 2700 

AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT CAATGTATCT 2760 

TATCATGTCT GGATCCCACC CACATCTGGT ATAAAAGGAG GCAGTGGCCC ACAGAGGAGC 2820 

ACAGCTGTGT TTGGCTGCAG GGCCAAGAGC GCTGTCAAGA AGACCCACAC GCCCCCCTCC 2880 

AGCAGCTGAA TTCCAGCTGG CATTCCGGTA CTGTTGGTAA AATGGAAGAC GCCAAAAACA 2940 

TAAAGAAAGG CCCGGCGCCA TTCTATCCTC TAGAGGATGG AACCGCTGGA GAGCAACTGC 3000 

ATAAGGCTAT GAAGAGATAC GCCCTGGTTC CTGGAACAAT TGCTTTTACA GATGCACATA 3060 

TCGAGGTGAA CATCACGTAC GCGGAATACT TCGAAATGTC CGTTCGGTTG GCAGAAGCTA 3120 

TGAAACGATA TGGGCTGAAT ACAAATCACA GAATCGTCGT ATGCAGTGAA AACTCTCTTC 3180 

AATTCTTTAT GCCGGTGTTG GGCGCGTTAT TTATCGGAGT TGCAGTTGCG CCCGCGAACG 3240 

ACATTTATAA TGAACGTGAA TTGCTCAACA GTATGAACAT TTCGCAGCCT ACCGTAGTGT 3300 

TTGTTTCCAA AAAGGGGTTG CAAAAAATTT TGAACGTGCA AAAAAAATTA CCAATAATCC 3360 

AGAAAATTAT TATCATGGAT TCTAAAACGG ATTACCAGGG ATTTCAGTCG ATGTACACGT 3420 

TCGTCACATC TCATCTACCT CCCGGTTTTA ATGAATACGA TTTTGTACCA GAGTCCTTTG 3480 

ATCGTGACAA AACAATTGCA CTGATAATGA ATTCCTCTGG ATCTACTGGG TTACCTAAGG 3540 
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GTGTGGCCCT TCCGCATAGA ACTGCCTGCG TCAGATTCTC GCATGCCAGA GATCCTATTT 3600 

TTGGCAATCA AATCATTCCG GATACTGCGA TTTTAAGTGT TGTTCCATTC CATCACGGTT 3660 

TTGGAATGTT TACTACACTC GGATATTTGA TATGTGGATT TCGAGTCGTC TTAATGTATA 3720 

GATTTGAAGA AGAGCTGTTT TTACGATCCC TTCAGGATTA CAAAATTCAA AGTGCGTTGC 3780 

TAGTACCAAC CCTATTTTCA TTCTTCGCCA AAAGCACTCT GATTGACAAA TACGATTTAT 3840 

CTAATTTACA CGAAATTGCT TCTGGGGGCG CACCTCTTTC GAAAGAAGTC GGGGAAGCGG 3900 

TTGCAAAACG CTTCCATCTT CCAGGGATAC GACAAGGATA TGGGCTCACT GAGACTACAT 3960 

CAGCTATTCT GATTACACCC GAGGGGGATG ATAAACCGGG CGCGGTCGGT AAAGTTGTTC 4020 

CATTTTTTGA AGCGAAGGTT GTGGATCTGG ATACCGGGAA AACGCTGGGC GTTAATCAGA 4080 

GAGGCGAATT ATGTGTCAGA GGACCTATGA TTATGTCCGG TTATGTAAAC AATCCGGAAG 4140 

CGACCAACGC CTTGATTGAC AAGGATGGAT GGCTACATTC TGGAGACATA GCTTACTGGG 4200 

ACGAAGACGA ACACTTCTTC ATAGTTGACC GCTTGAAGTC TTTAATTAAA TACAAAGGAT 4260 

ATCAGGTGGC CCCCGCTGAA TTGGAATCGA TATTGTTACA ACACCCCAAC ATCTTCGACG 4320 

CGGGCGTGGC AGGTCTTCCC GACGATGACG CCGGTGAACT TCCCGCCGCC GTTGTTGTTT 4380 

TGGAGCACGG AAAGACGATG ACGGAAAAAG AGATCGTGGA TTACGTCGCC AGTCAAGTAA 4440 

CAACCGCGAA AAAGTTGCGC GGAGGAGTTG TGTTTGTGGA CGAAGTACCG AAAGGTCTTA 4500 

CCGGAAAACT CGACGCAAGA AAAATCAGAG AGATCCTCAT AAAGGCCAAG AAGGGCGGAA 4560 

AGTCCAAATT GTAAAATGTA ACTGTATTCA GCGATGACGA AATTCTTAGC TATTGTAATG 4620 

ACTCTAGAGG ATCTTTGTGA AGGAACCTTA CTTCTGTGGT GTGACATAAT TGGACAAACT 4680 

ACCTACAGAG ATTTAAAGCT CTAAGGTAAA TATAAAATTT TTAAGTGTAT AATGTGTTAA 4740 

ACTACTGATT CTAATTGTTT GTGTATTTTA GATTCCAACC TATGGAACTG ATGAATGGGA 4800 

GCAGTGGTGG AATGCCTTTA ATGAGGAAAA CCTGTTTTGC TCAGAAGAAA TGCCATCTAG 4860 

TGATGATGAG GCTACTGCTG ACTCTCAACA TTCTACTCCT CCAAAAAAGA AGAGAAAGGT 4920 

AGAAGACCCC AAGGACTTTC CTTCAGAATT GCTAAGTTTT TTGAGTCATG CTGTGTTTAG 4980 

TAATAGAACT CTTGCTTGCT TTGCTATTTA CACCACAAAG GAAAAAGCTG CACTGCTATA 5040 

CAAGAAAATT ATGGAAAAAT ATTCTGTAAC CTTTATAAGT AGGCATAACA GTTATAATCA 5100 
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TAACATACTG TTTTTTCTTA CTCCACACAG GCATAGAGTG TCTGCTATTA ATAACTATGC 5160 

TCAAAAATTG TGTACCTTTA GCTTTTTAAT TTGTAAAGGG GTTAATAAGG AATATTTGAT 5220 

GTATAGTGCC TTGACTAGAG ATCATAATCA GCCATACCAC ATTTGTAGAG GTTTTACTTG 5280 

CTTTAAAAAA CCTCCCACAC CTCCCCCTGA ACCTGAAACA TAAAATGAAT GCAATTGTTG 5340 

TTGTTAACTT GTTTATTGCA GCTTATAATG GTTACAAATA AAGCAATAGC ATCACAAATT 5400 

TCACAAATAA AGCATTTTTT TCACTGCATT CTAGTTGTGG TTTGTCCAAA CTCATCAATG 5460 

TATCTTATCA TGTCTGGATC CCCAGGAAGC TCCTCTGTGT CCTCATAAAC CCTAACCTCC 5520 

TCTACTTGAG AGGACATTCC AATCATAGGC TGCCCATCCA CCCTCTGTGT CCTCCTGTTA 5580 

ATTAGGTCAC TTAACAAAAA GGAAATTGGG TAGGGGTTTT TCACAGACCG CTTTCTAAGG 5640 

GTAATTTTAA AAT AT CTGGG AAGTCCCTTC CACTGCTGTG TTCCAGAAGT GTTGGTAAAC 5700 

AGCCCACAAA TGTCAACAGC AGAAACATAC AAGCTGTCAG CTTTGCACAA GGGCCCAACA 5760 

CCCTGCTCAG CAAGAAGCAC TGTGGTTGCT GTGTTAGTAA TGTGCAAAAC AGGAGGCACA 5820 

TTTTCCCCAC CTGTGTAGGT TCCAAAATAT CTAGTGTTTT CATTTTTACT TGGATCAGGA 5880 

ACCCAGCACT CCACTGGATA AGCATTATCC TTATCCAAAA CAGCCTTGTG GTCAGTGTTC 5940 

ATCTGCTGAC TGTCAACTGT AGCATTTTTT GGGGTTACAG TTTGAGCAGG ATATTTGGTC 6000 

CTGTAGTTTG CTAACACACC CTGCAGCTCC AAAGGTTCCC CACCAACAGC AAAAAAATGA 6060 

AAATTTGACC CTTGAATGGG TTTTCCAGCA CCATTTTCAT GAGTTTTTTG TGTCCCTGAA 6120 

TGCAAGTTTA ACATAGCAGT TACCCCAATA ACCTCAGTTT TAACAGTAAC AGCTTCCCAC 6180 

ATCAAAATAT TTCCACAGGT TAAGTCCTCA TTTAAATTAG GCAAAGGAA 6229 
(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 10768 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC'GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 

AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATACCAAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 
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TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100 

ATAGTTAAGC CAGTATTCGA CCTCGAGGGA TCTTTGTGAA GGAACCTTAC TTCTGTGGTG 2160 

TGACATAATT GG AC AAACTA CCTACAGAGA TTTAAAGCTC TAAGGTAAAT ATAAAATTTT 2220 

TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG TGTATTTTAG ATTCCAACCT 2280 

ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA TGAGGAAAAC CTGTTTTGCT 2340 

CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA CTCTCAACAT TCTACTCCTC 2400 

CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC TTCAGAATTG CTAAGTTTTT 2460 

TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT TGCTATTTAC ACCACAAAGG 2520 

AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA TTCTGTAACC TTTATAAGTA 2580 

GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC TCCACACAGG CATAGAGTGT 2640 

CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG CTTTTTAATT TGTAAAGGGG 2700 

TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA TCATAATCAG CCATACCACA 2760 

TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC TCCCCCTGAA CCTGAAACAT 2820 

AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG CTTATAATGG TTACAAATAA 2880 

AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT CACTGCATTC TAGTTGTGGT 2940 

TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC GGCTGTGGAA TGTGTGTCAG 3000 
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TTAGGGTGTG GAAAGTCCCC AGGCTCCCCA GCAGGCAGAA GTATGCAAAG CATGCATCTC 3060 

AATTAGTCAG CAACCAGGTG TGGAAAGTCC CCAGGCTCCC CAGCAGGCAG AAGTATGCAA 3120 

AGCATGCATC TCAATTAGTC AGCAACCATA GTCCCGCCCC TAACTCCGCC CATCCCGCCC 3180 

CTAACTCCGC CCAGTTCCGC CCATTCTCCG CCCCATGGCT GACTAATTTT TTTTATTTAT 3240 

GCAGAGGCCG AGGCCGCCTC GGCCTCTGAG CTATTCCAGA AGTAGTGAGG AGGCTTTTTT 3300 

GGAGGCCTAG GCTTTTGCAA AAAGCTTCAC GCTGCCGCAA GCACTCAGGG CGCAAGGGCT 3360 

GCTAAAGGAA GCGGAACACG TAGAAAGCCA GTCCGCAGAA ACGGTGCTGA CCCCGGATGA 3420 

ATGTCAGCTA CTGGGCTATC TGGACAAGGG AAAACGCAAG CGCAAAGAGA AAGCAGGTAG 3480 

CTTGCAGTGG GCTTACATGG CGATAGCTAG ACTGGGCGGT TTTATGGACA GCAAGCGAAC 3540 

CGGAATTGCC AGCTGGGGCG CCCTCTGGTA AGGTTGGGAA GCCCTGCAAA GTAAACTGGA 3600 

TGGCTTTCTT GCCGCCAAGG ATCTGATGGC GCAGGGGATC AAGATCTGAT CAAGAGACAG 3660 

GATGAGGATC GTTTCGCATG ATTGAACAAG ATGGATTGCA CGCAGGTTCT CCGGCCGCTT 3720 

GGGTGGAGAG GCTATTCGGC TATGACTGGG CACAACAGAC AATCGGCTGC TCTGATGCCG 3780 

CCGTGTTCCG GCTGTCAGCG CAGGGGCGCC CGGTTCTTTT TGTCAAGACC GACCTGTCCG 3840 

GTGCCCTGAA TGAACTGCAG GACGAGGCAG CGCGGCTATC GTGGCTGGCC ACGACGGGCG 3900 

TTCCTTGCGC AGCTGTGCTC GACGTTGTCA CTGAAGCGGG AAGGGACTGG CTGCTATTGG 3960 

GCGAAGTGCC GGGGCAGGAT CTCCTGTCAT CTCACCTTGC TCCTGCCGAG AAAGTATCCA 4020 

TCATGGCTGA TGCAATGCGG CGGCTGCATA CGCTTGATCC GGCTACCTGC CCATTCGACC 4080 

ACCAAGCGAA ACATCGCATC GAGCGAGCAC GTACTCGGAT GGAAGCCGGT CTTGTCGATC 4140 

AGGATGATCT GGACGAAGAG CATCAGGGGC TCGCGCCAGC CGAACTGTTC GCCAGGCTCA 4200 

AGGCGCGCAT GCCCGACGGC GAGGATCTCG TCGTGACCCA TGGCGATGCC TGCTTGCCGA 4260 

ATATCATGGT GGAAAATGGC CGCTTTTCTG GATTCATCGA CTGTGGCCGG CTGGGTGTGG 4320 

CGGACCGCTA TCAGGACATA GCGTTGGCTA CCCGTGATAT TGCTGAAGAG CTTGGCGGCG 4380 

AATGGGCTGA CCGCTTCCTC GTGCTTTACG GTATCGCCGC TCCCGATTCG CAGCGCATCG 4440 

CCTTCTATCG CCTTCTTGAC GAGTTCTTCT GAGCGGGACT CTGGGGTTCG AAATGACCGA 4500 

CCAAGCGACG CCCAACCTGC CATCACGAGA TTTCGATTCC ACCGCCGCCT TCTATGAAAG 4560 
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GTTGGGCTTC GGAATCGTTT TCCGGGACGC CGGCTGGATG 


ATCCTCCAGC 


GCGGGGATCT 


4620 


CATGCTGGAG TTCTTCGCCC ACCCCGGGCT CGATCCCCTC 


GCGAGTTGGT 


TCAGCTGCTG 


4680 


CCTGAGGCTG GACGACCTCG CGGAGTTCTA CCGGCAGTGC 


AAATCCGTCG 


GCATCCAGGA 


4740 


AACCAGCAGC GGCTATCCGC GCATCCATGC CCCCGAACTG 


CAGGAGTGGG 


GAGGCACGAT 


4800 


GGCCGCTTTG GTCCCGGATC TTTGTGAAGG AACCTTACTT 


CTGTGGTGTG 


ACATAATTGG 


4860 


ACAAACTACC TACAGAGATT TAAAGCTCTA AGGTAAATAT 


AAAATTTTTA 


AGTGTATAAT 


4920 


GTGTTAAACT ACTGATTCTA ATTGTTTGTG TATTTTAGAT 


TCCAACCTAT 


GGAACTGATG 


4980 


AATGGGAGCA GTGGTGGAAT GCCTTTAATG AGGAAAACCT 


GTTTTGCTCA 


GAAGAAATGC 


5040 


CATCTAGTGA TGATGAGGCT ACTGCTGACT CTCAACATTC 


TACTCCTCCA 


AAAAAGAAGA 


5100 


GAAAGGTAGA AGACCCCAAG GACTTTCCTT CAGAATTGCT 


AAGTTTTTTG 


AGTCATGCTG 


5160 


TGTTTAGTAA TAGAACTCTT GCTTGCTTTG CTATTTACAC 


CACAAAGGAA 


AAAGCTGCAC 


5220 


TGCTATACAA GAAAATTATG GAAAAATATT CTGTAACCTT 


TATAAGTAGG 


CATAACAGTT 


5280 


ATAATCATAA CATACTGTTT TTTCTTACTC CACACAGGCA 


TAGAGTGTCT 


GCTATTAATA 


5340 


ACTATGCTCA AAAATTGTGT ACCTTTAGCT TTTTAATTTG 


TAAAGGGGTT 


AATAAGGAAT 


5400 


ATTTGATGTA TAGTGCCTTG ACTAGAGATC ATAATCAGCC 


ATACCACATT 


TGTAGAGGTT 


5460 


TTACTTGCTT TAAAAAACCT CCCACACCTC CCCCTGAACC 


TGAAACATAA 


AATGAATGCA 


5520 


ATTGTTGTTG TTAACTTGTT TATTGCAGCT TATAATGGTT 


ACAAATAAAG 


CAATAGCATC 


5580 


ACAAATTTCA CAAATAAAGC ATTTTTTTCA CTGCATTCTA 


GTTGTGGTTT 


GTCCAAACTC 


5640 


ATCAATGTAT CTTATCATGT CTGGATCCCC AGGAAGCTCC 


TCTGTGTCCT 


CATAAACCCT 


5700 


AACCTCCTCT ACTTGAGAGG ACATTCCAAT CATAGGCTGC 


CCATCCACCC 


TCTGTGTCCT 


5760 


'CCTGTTAATT AGGTCACTTA ACAAAAAGGA AATTGGGTAG 


CCGTTTTTCA 


CAGACCGCTT 


5820 


TCTAAGGGTA ATTTTAAAAT ATCTGGGAAG TCCCTTCCAC 


TGCTGTGTTC 


CAGAAGTGTT 


5880 


GGTAAACAGC CCACAAATGT CAACAGCAGA AACATACAAG 


CTGTCAGCTT 


TGCACAAGGG 


5940 


CCCAACACCC TGCTCATCAA GAAGCACTGT GGTTGCTGTG 


TTAGTAATGT 


GCAAAACAGG 


6000 


AGGCACATTT TCCCCACCTG TGTAGGTTCC AAAATATCTA 


GTGTTTTCAT 


TTTTACTTGG 


6060 


ATCAGGAACC CAGCACTCCA CTGGATAAGC ATTATCCTTA 


TCCAAAACAG 


CCTTGTGGTC 


6120 
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AGTGTTCATC TGCTGACTGT CAACTGTAGC ATTTTTTGGG GTTACAGTTT GAGCAGGATA 6180 

TTTGGTCCTG TAGTTTGCTA ACACACCCTG CAGCTCCAAA GGTTCCCCAC CAACAGCAAA 6240 

AAAATGAAAA TTTGACCCTT GAATGGGTTT TCCAGCACCA TTTTCATGAG TTTTTTGTGT 6300 

CCCTGAATGC AAGTTTAACA TAGCAGTTAC CCCAATAACC TCAGTTTTAA CAGTAACAGC 6360 

TTCCCACATC AAAATATTTC CACAGGTTAA GTCCTCATTT AAATTAGGCA AAGGAATTAT 6420 

ACACTCCGCT ATCGCTACGT GACTGGGTCA TGGCTGCGCC CCGACACCCG CCAACACCCG 6480 

CTGACGCGCC CTGACGGGCT TGTCTGCTCC CGGCATCCGC TTACAGACAA GCTGTGACCG 6540 

TCTCCGGGAG CTGCATGTGT CAGAGGTTTT CACCGTCATC ACCGAAACGC GCGAGGCAGC 6600 

GGATCATAAT CAGCCATACC ACATTTGTAG AGGTTTTACT TGCTTTAAAA AACCTCCCAC 6660 

ACCTCCCCCT GAACCTGAAA CATAAAATGA ATGCAATTGT TGTTGTTAAC TTGTTTATTG 6720 

CAGCTTATAA TGGTTACAAA TAAAGCAATA GCATCACAAA TTTCACAAAT AAAGCATTTT 6780 

TTTCACTGCA TTCTAGTTGT GGTTTGTCCA AACTCATCAA TGTATCTTAT CATGTCTGGA 6840 

TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 6900 

TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 6960 

CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 7020 

CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 7080 

CAGGCCAGAC GCCAACAAGG TAGGAGCTGG AGCATTCGGG CTGGGTTTCA CCCCACCGCA 7140 

CGGAGGCCTT TTGGGGTGGA GCCCTCAGGC TCAGGGCATA CTACAAACTT TGCCAGCAAA 7200 

TCCGCCTCCT GCCTCCACCA ATCGCCAGTC AGGAAGGCAG CCTACCCCGC TGTCTCCACC 7260 

TTTGAGAAAC ACTCATCCTC AGGCCATGCA GTGGAATTCC ACAACCTTCC ACCAAACTCT 7320 

GCAAGATCCC AGAGTGAGAG GCCTGTATTT CCCTGCTGGT GGCTCCAGTT CAGGAACAGT 7380 

AAACCCTGTT CTGACTACTG CCTCTCCCTT ATCGTCAATC TTCTCGAAAT TCCAGCTGGC 7440 

ATTCCGGTAC TGTTGGTAAA ATGGAAGACG CCAAAAACAT AAAGAAAGGC CCGGCGCCAT 7500 

TCTATCCTCT AGAGGATGGA ACCGCTGGAG AGCAACTGCA TAAGGCTATG AAGAGATACG 7560 

CCCTGGTTCC TGGAACAATT GCTTTTACAG ATGCACATAT CGAGGTGAAC ATCACGTACG 7620 

CGGAATACTT CGAAATGTCC GTTCGGTTGG CAGAAGCTAT GAAACGATAT GGGCTGAATA 7680 
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CAAATCACAG AATCGTCGTA TGCAGTGAAA ACTCTCTTCA ATTCTTTATG CCGGTGTTGG 7740 

GCGCGTTATT TATCGGAGTT GCAGTTGCGC CCGCGAACGA CATTTATAAT GAACGTGAAT 7800 

TGCTCAACAG TATGAACATT TCGCAGCCTA CCGTAGTGTT TGTTTCCAAA AAGGGGTTGC 7860 

AAAAAATTTT GAACGTGCAA AAAAAATTAC CAATAATCCA GAAAATTATT ATCATGGATT 7920 

CTAAAACGGA TTACCAGGGA TTTCAGTCGA TGTACACGTT CGTCACATCT CATCTACCTC 7980 

CCGGTTTTAA TGAATACGAT TTTGTACCAG AGTCCTTTGA TCGTGACAAA ACAATTGCAC 8040 

TGATAATGAA TTCCTCTGGA TCTACTGGGT TACCTAAGGG TGTGGCCCTT C CG CAT AG AA 8100 

CTGCCTGCGT CAGATTCTCG CATGCCAGAG ATCCTATTTT TGGCAATCAA ATCATTCCGG 8160 

ATACTGCGAT TTTAAGTGTT GTTCCATTCC ATCACGGTTT TGGAATGTTT ACTACACTCG 8220 

GATATTTGAT ATGTGGATTT CGAGTCGTCT TAATGTATAG ATTTGAAGAA GAGCTGTTTT 8280 

TACGATCCCT TCAGGATTAC AAAATTCAAA GTGCGTTGCT AGTACCAACC CTATTTTCAT 8340 

TCTTCGCCAA AAGCACTCTG ATTGACAAAT ACGATTTATC TAATTTACAC GAAATTGCTT 8400 

CTGGGGGCGC ACCTCTTTCG AAAGAAGTCG GGGAAGCGGT TGCAAAACGC TTCCATCTTC 8460 

CAGGGATACG ACAAGGATAT GGGCTCACTG AGACTACATC AGCTATTCTG ATTACACCCG 8520 

AGGGGGATGA TAAACCGGGC GCGGTCGGTA AAGTTGTTCC ATTTTTTGAA GCGAAGGTTG 8580 

TGGATCTGGA TACCGGGAAA ACGCTGGGCG TTAATCAGAG AGGCGAATTA TGTGTCAGAG 8640 

GACCTATGAT TATGTCCGGT TATGTAAACA ATCCGGAAGC GACCAACGCC TTGATTGACA 8700 

AGGATGGATG GCTACATTCT GGAGACATAG CTTACTGGGA CGAAGACGAA CACTTCTTCA 8760 

T AGTTG AC CG CTTGAAGTCT TTAATTAAAT ACAAAGGATA TCAGGTGGCC CCCGCTGAAT 8820 

TGGAATCGAT ATTGTTACAA CACCCCAACA TCTTCGACGC GGGCGTGGCA GGTCTTCCCG 8880 

ACGATGACGC CGGTGAACTT CCCGCCGCCG TTGTTGTTTT GG AG CACGG A AAGACGATGA 8940 

CGGAAAAAGA GATCGTGGAT TACGTCGCCA GTCAAGTAAC AACCGCGAAA AAGTTGCGCG 9000 

GAGGAGTTGT GTTTGTGGAC GAAGTACCGA AAGGTCTTAC CGGAAAACTC GACGCAAGAA 9060 

AAATCAGAGA GATCCTCATA AAGGCCAAGA AGGGCGGAAA GTCCAAATTG TAAAATGTAA 9120 

CTGTATTCAG CGATGACGAA ATTCTTAGCT ATTGTAATGA CTCTAGAGGA TCTTTGTGAA 9180 

GG AACCTTAC TTCTGTGGTG TGACATAATT GGACAAACTA CCTACAGAGA TTTAAAGCTC 9240 
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TAAGGTAAAT ATAAAATTTT TAAGTGTATA ATGTGTTAAA CTACTGATTC TAATTGTTTG 9300 

TGTATTTTAG ATTCCAACCT ATGGAACTGA TGAATGGGAG CAGTGGTGGA ATGCCTTTAA 9360 

TGAGGAAAAC CTGTTTTGCT CAGAAGAAAT GCCATCTAGT GATGATGAGG CTACTGCTGA 9420 

CTCTCAACAT TCTACTCCTC CAAAAAAGAA GAGAAAGGTA GAAGACCCCA AGGACTTTCC 9480 

TTCAGAATTG CTAAGTTTTT TGAGTCATGC TGTGTTTAGT AATAGAACTC TTGCTTGCTT 9540 

TGCTATTTAC ACCACAAAGG AAAAAGCTGC ACTGCTATAC AAGAAAATTA TGGAAAAATA 9600 

TTCTGTAACC TTTATAAGTA GGCATAACAG TTATAATCAT AACATACTGT TTTTTCTTAC 9660 

TCCACACAGG CATAGAGTGT CTGCTATTAA TAACTATGCT CAAAAATTGT GTACCTTTAG 9720 

CTTTTTAATT TGTAAAGGGG TTAATAAGGA ATATTTGATG TATAGTGCCT TGACTAGAGA 9780 

TCATAATCAG CCATACCACA TTTGTAGAGG TTTTACTTGC TTTAAAAAAC CTCCCACACC 9840 

TCCCCCTGAA CCTGAAACAT AAAATGAATG CAATTGTTGT TGTTAACTTG TTTATTGCAG 9900 

CTTATAATGG TTACAAATAA AGCAATAGCA TCACAAATTT CACAAATAAA GCATTTTTTT 9960 

CACTGCATTC TAGTTGTGGT TTGTCCAAAC TCATCAATGT ATCTTATCAT GTCTGGATCC 10020 

CCAGGAAGCT CCTCTGTGTC CTCATAAACC CTAACCTCCT CTACTTGAGA GGACATTCCA 10080 

ATCATAGGCT GCCCATCCAC CCTCTGTGTC CTCCTGTTAA TTAGGTCACT TAACAAAAAG 10140 

GAAATTGGGT AGGGGTTTTT CACAGACCGC TTTCTAAGGG TAATTTTAAA ATATCTGGGA 10200 

AGTCCCTTCC ACTGCTGTGT TCCAGAAGTG TTGGTAAACA GCCCACAAAT GTCAACAGCA 10260 

GAAACATACA AGCTGTCAGC TTTGCACAAG GGCCCAACAC CCTGCTCAGC AAGAAGCACT 1.0320 

GTGGTTGCTG TGTTAGTAAT GTGCAAAACA GGAGGCACAT TTTCCCCACC TGTGTAGGTT 10380 

CCAAAATATC TAGTGTTTTC ATTTTTACTT GGATCAGGAA CCCAGCACTC CACTGGATAA 10440 

GCATTATCCT TATCCAAAAC AGCCTTGTGG TCAGTGTTCA TCTGCTGACT GTCAACTGTA 10500 

GCATTTTTTG GGGTTACAGT TTGAGCAGGA TATTTGGTCC TGTAGTTTGC TAACACACCC 10560 

TGCAGCTCCA AAGGTTCCCC ACCAACAGCA AAAAAATGAA AATTTGACCC TTGAATGGGT 10620 

TTTCCAGCAC CATTTTCATG AGTTTTTTGT GTCCCTGAAT GCAAGTTTAA CATAGCAGTT 10680 

ACCCCAATAA CCTCAGTTTT AACAGTAACA GCTTCCCACA TCAAAATATT TCCACAGGTT 10740 

AAGTCCTCAT TTAAATTAGG CAAAGGAA 10768 
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(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 6464 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: circular 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

TTCTTGAAGA CGAAAGGGCC TCGTGATACG CCTATTTTTA TAGGTTAATG TCATGATAAT 60 

AATGGTTTCT TAGACGTCAG GTGGCACTTT TCGGGGAAAT GTGCGCGGAA CCCCTATTTG 120 

TTTATTTTTC TAAATACATT CAAATATGTA TCCGCTCATG AGACAATAAC CCTGATAAAT 180 

GCTTCAATAA TATTGAAAAA GGAAGAGTAT GAGTATTCAA CATTTCCGTG TCGCCCTTAT 240 

TCCCTTTTTT GCGGCATTTT GCCTTCCTGT TTTTGCTCAC CCAGAAACGC TGGTGAAAGT 300 

AAAAGATGCT GAAGATCAGT TGGGTGCACG AGTGGGTTAC ATCGAACTGG ATCTCAACAG 360 

CGGTAAGATC CTTGAGAGTT TTCGCCCCGA AGAACGTTTT CCAATGATGA GCACTTTTAA 420 

AGTTCTGCTA TGTGGCGCGG TATTATCCCG TGTTGACGCC GGGCAAGAGC AACTCGGTCG 480 

CCGCATACAC TATTCTCAGA ATGACTTGGT TGAGTACTCA CCAGTCACAG AAAAGCATCT 540 

TACGGATGGC ATGACAGTAA GAGAATTATG CAGTGCTGCC ATAACCATGA GTGATAACAC 600 

TGCGGCCAAC TTACTTCTGA CAACGATCGG AGGACCGAAG GAGCTAACCG CTTTTTTGCA 660 

CAACATGGGG GATCATGTAA CTCGCCTTGA TCGTTGGGAA CCGGAGCTGA ATGAAGCCAT 720 

ACCAAACGAC GAGCGTGACA CCACGATGCC TGCAGCAATG GCAACAACGT TGCGCAAACT 780 

ATTAACTGGC GAACTACTTA CTCTAGCTTC CCGGCAACAA TTAATAGACT GGATGGAGGC 840 

GGATAAAGTT GCAGGACCAC TTCTGCGCTC GGCCCTTCCG GCTGGCTGGT TTATTGCTGA 900 

TAAATCTGGA GCCGGTGAGC GTGGGTCTCG CGGTATCATT GCAGCACTGG GGCCAGATGG 960 

TAAGCCCTCC CGTATCGTAG TTATCTACAC GACGGGGAGT CAGGCAACTA TGGATGAACG 1020 
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AAATAGACAG ATCGCTGAGA TAGGTGCCTC ACTGATTAAG CATTGGTAAC TGTCAGACCA 1080 

AGTTTACTCA TATATACTTT AGATTGATTT AAAACTTCAT TTTTAATTTA AAAGGATCTA 1140 

GGTGAAGATC CTTTTTGATA ATCTCATGAC CAAAATCCCT TAACGTGAGT TTTCGTTCCA 1200 

CTGAGCGTCA GACCCCGTAG AAAAGATCAA AGGATCTTCT TGAGATCCTT TTTTTCTGCG 1260 

CGTAATCTGC TGCTTGCAAA CAAAAAAACC ACCGCTACCA GCGGTGGTTT GTTTGCCGGA 1320 

TCAAGAGCTA CCAACTCTTT TTCCGAAGGT AACTGGCTTC AGCAGAGCGC AGATAGCAAA 1380 

TACTGTCCTT CTAGTGTAGC CGTAGTTAGG CCACCACTTC AAGAACTCTG TAGCACCGCC 1440 

TACATACCTC GCTCTGCTAA TCCTGTTACC AGTGGCTGCT GCCAGTGGCG ATAAGTCGTG 1500 

TCTTACCGGG TTGGACTCAA GACGATAGTT ACCGGATAAG GCGCAGCGGT CGGGCTGAAC 1560 

GGGGGGTTCG TGCACACAGC CCAGCTTGGA GCGAACGACC TACACCGAAC TGAGATACCT 1620 

ACAGCGTGAG CATTGAGAAA GCGCCACGCT TCCCGAAGGG AGAAAGGCGG ACAGGTATCC 1680 

* GGTAAGCGGC AGGGTCGGAA CAGGAGAGCG CACGAGGGAG CTTCCAGGGG GAAACGCCTG 1740 

GTATCTTTAT AGTCCTGTCG GGTTTCGCCA CCTCTGACTT GAGCGTCGAT TTTTGTGATG 1800 

CTCGTCAGGG GGGCGGAGCC TATGGAAAAA CGCCAGCAAC GCGGCCTTTT TACGGTTCCT 1860 

GGCCTTTTGC TGGCCTTTTG CTCACATGTT CTTTCCTGCG TTATCCCCTG ATTCTGTGGA 1920 

TAACCGTATT ACCGCCTTTG AGTGAGCTGA TACCGCTCGC CGCAGCCGAA CGACCGAGCG 1980 

CAGCGAGTCA GTGAGCGAGG AAGCGGAAGA GCGCCTGATG CGGTATTTTC TCCTTACGCA 2040 

TCTGTGCGGT ATTTCACACC GCATATGGTG CACTCTCAGT ACAATCTGCT CTGATGCCGC 2100 

ATAGTTAAGC CAGTATACAC TCCGCTATCG CTACGTGACT GGGTCATGGC TGCGCCCCGA 2160 

CACCCGCCAA CACCCGCTGA CGCGCCCTGA CGGGCTTGTC TGCTCCCGGC ATCCGCTTAC 2220 

AGACAAGCTG TGACCGTCTC CGGGAGCTGC ATGTGTCAGA GGTTTTCACC GTCATCACCG 2280 

AAACGCGCGA GGCAGCGGAT CATAATCAGC CATACCACAT TTGTAGAGGT TTTACTTGCT 2340 

TTAAAAAACC TCCCACACCT CCCCCTGAAC CTGAAACATA AAATGAATGC AATTGTTGTT 2400 

GTTAACTTGT TTATTGCAGC TTATAATGGT TACAAATAAA GCAATAGCAT CACAAATTTC 2460 

ACAAATAAAG CATTTTTTTC ACTGCATTCT AGTTGTGGTT TGTCCAAACT CATCAATGTA 2520 

TCTTATCATG TCTGGATCAT AATCAGCCAT ACCACATTTG TAGAGGTTTT ACTTGCTTTA 2580 
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AAAAACCTCC CACACCTCCC CCTGAACCTG AAACATAAAA TGAATGCAAT TGTTGTTGTT 2640 

AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA ATAGCATCAC AAATTTCACA 2700 

AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT CAATGTATCT 2760 

TATCATGTCT GGATCCCAGG CCAGACGCCA ACAAGGTAGG AGCTGGAGCA TTCGGGCTGG 2820 

GTTTCACCCC ACCGCACGGA GGCCTTTTGG GGTGGAGCCC TCAGGCTCAG GGCATACTAC 2880 

AAACTTTGCC AGCAAATCCG CCTCCTGCCT CCACCAATCG CCAGTCAGGA AGGCAGCCTA 2940 

CCCCGCTGTC TCCACCTTTG AGAAACACTC ATCCTCAGGC CATGCAGTGG AATTCCACAA 3000 

CCTTCCACCA AACTCTGCAA GATCCCAGAG TGAGAGGCCT GTATTTCCCT GCTGGTGGCT 3060 

CCAGTTCAGG AACAGTAAAC CCTGTTCTGA CTACTGCCTC TCCCTTATCG TCAATCTTCT 3120 

CGAAATTCCA GCTGGCATTC CGGTACTGTT GGTAAAATGG AAGACGCCAA AAACATAAAG 3180 

AAAGGCCCGG CGCCATTCTA TCCTCTAGAG GATGGAACCG CTGGAGAGCA ACTGCATAAG 3240 

GCTATGAAGA GATACGCCCT GGTTCCTGGA ACAATTGCTT TTACAGATGC ACATATCGAG 3300 

GTGAACATCA CGTACGCGGA ATACTTCGAA ATGTCCGTTC GGTTGGCAGA AGCTATGAAA 3360 

CGATATGGGC TGAATACAAA TCACAGAATC GTCGTATGCA GTGAAAACTC TCTTCAATTC 3420 

TTTATGCCGG TGTTGGGCGC GTTATTTATC GGAGTTGCAG TTGCGCCCGC GAACGACATT 3480 

TATAATGAAC GTGAATTGCT CAACAGTATG AACATTTCGC AGCCTACCGT AGTGTTTGTT 3540 

TCCAAAAAGG GGTTGCAAAA AATTTTGAAC GTGCAAAAAA AATTACCAAT AATCCAGAAA 3600 

ATTATTATCA TGGATTCTAA AACGGATTAC CAGGGATTTC AGTCGATGTA CACGTTCGTC 3660 

ACATCTCATC TACCTCCCGG TTTTAATGAA TACGATTTTG TACCAGAGTC CTTTGATCGT 3720 

GACAAAACAA TTGCACTGAT AATGAATTCC TCTGGATCTA CTGGGTTACC TAAGGGTGTG 3780 

GCCCTTCCGC ATAGAACTGC CTGCGTCAGA TTCTCGCATG CCAGAGATCC TATTTTTGGC 3840 

AATCAAATCA TTCCGGATAC TGCGATTTTA AGTGTTGTTC CATTCCATCA CGGTTTTGGA 3900 

ATGTTTACTA CACTCGGATA TTTGATATGT GGATTTCGAG TCGTCTTAAT GTATAGATTT 3960 

GAAGAAGAGC TGTTTTTACG ATCCCTTCAG GATTACAAAA TTCAAAGTGC GTTGCTAGTA 4020 

CCAACCCTAT TTTCATTCTT CGCCAAAAGC ACTCTGATTG ACAAATACGA TTTATCTAAT 4080 

TTACACGAAA TTGCTTCTGG GGGCGCACCT CTTTCGAAAG AAGTCGGGGA AGCGGTTGCA 4140 
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AAACGCTTCC ATCTTCCAGG GATACGACAA GGATATGGGC TCACTGAGAC TACATCAGCT 
ATTCTGATTA CACCCGAGGG GGATGATAAA CCGGGCGCGG TCGGTAAAGT TGTTCCATTT 
TTTGAAGCGA AGGTTGTGGA TCTGGATACC GGGAAAACGC TGGGCGTTAA TCAGAGAGGC 
GAATTATGTG TCAGAGGACC TATGATTATG TCCGGTTATG TAAACAATCC GGAAGCGACC 
AACGCCTTGA TTGACAAGGA TGGATGGCTA CATTCTGGAG ACATAGCTTA CTGGGACGAA 
GACGAACACT TCTTCATAGT TGACCGCTTG AAGTCTTTAA TTAAATACAA AGGATATCAG 
GTGGCCCCCG CTGAATTGGA ATCGATATTG TTACAACACC CCAACATCTT CGACGCGGGC 
GTGGCAGGTC TTCCCGACGA TGACGCCGGT GAACTTCCCG CCGCCGTTGT TGTTTTGGAG 
CACGGAAAGA CGATGACGGA AAAAGAGATC GTGGATTACG TCGCCAGTCA AGTAACAACC 
GCGAAAAAGT TGCGCGGAGG AGTTGTGTTT GTGGACGAAG TACCGAAAGG TCTTACCGGA 
AAACTCGACG CAAGAAAAAT CAGAGAGATC CTCATAAAGG CCAAGAAGGG CGGAAAGTCC 
AAATTGTAAA ATGTAACTGT ATTCAGCGAT GACGAAATTC TTAGCTATTG TAATGACTCT 
AGAGGATCTT TGTGAAGGAA CCTTACTTCT GTGGTGTGAC ATAATTGGAC AAACTACCTA 
CAGAGATTTA AAGCTCTAAG GTAAATATAA AATTTTTAAG TGTATAATGT GTTAAACTAC 
TGATTCTAAT TGTTTGTGTA TTTTAGATTC CAACCTATGG AACTGATGAA TGGGAGCAGT 
GGTGGAATGC CTTTAATGAG GAAAACCTGT TTTGCTCAGA AGAAATGCCA TCTAGTGATG 
ATGAGGCTAC TGCTGACTCT CAACATTCTA CTCCTCCAAA AAAGAAGAGA AAGGTAGAAG 
ACCCCAAGGA CTTTCCTTCA GAATTGCTAA GTTTTTTGAG TCATGCTGTG TTTAGTAATA 
GAACTCTTGC TTGCTTTGCT ATTTACACCA CAAAGGAAAA AGCTGCACTG CTATACAAGA 
AAATTATGGA AAAATATTCT GTAACCTTTA TAAGTAGGCA TAACAGTTAT AATCATAACA 
TACTGTTTTT TCTTACTCCA CACAGGCATA GAGTGTCTGC TATTAATAAC TATGCTCAAA 
AATTGTGTAC CTTTAGCTTT TTAATTTGTA AAGGGGTTAA TAAGGAATAT TTGATGTATA 
GTGCCTTGAC TAGAGATCAT AATCAGCCAT ACCACATTTG TAGAGGTTTT ACTTGCTTTA 
AAAAACCTCC CACACCTCCC CCTGAACCTG AAACATAAAA TGAATGCAAT TGTTGTTGTT 
AACTTGTTTA TTGCAGCTTA TAATGGTTAC AAATAAAGCA ATAGCATCAC AAATTTCACA 
AATAAAGCAT TTTTTTCACT GCATTCTAGT TGTGGTTTGT CCAAACTCAT CAATGTATCT 
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TATCATGTCT GGATCCCCAG GAAGCTCCTC TGTGTCCTCA TAAACCCTAA CCTCCTCTAC 5760 

TTGAGAGGAC ATTCCAATCA TAGGCTGCCC ATCCACCCTC TGTGTCCTCC TGTTAATTAG 5820 

GTCACTTAAC AAAAAGGAAA TTGGGTAGGG GTTTTTCACA GACCGCTTTC TAAGGGTAAT 5880 

TTTAAAATAT CTGGGAAGTC CCTTCCACTG CTGTGTTCCA GAAGTGTTGG TAAACAGCCC 5940 

ACAAATGTCA ACAGCAGAAA CATACAAGCT GTCAGCTTTG CACAAGGGCC CAACACCCTG 6000 

CTCAGCAAGA AGCACTGTGG TTGCTGTGTT AGTAATGTGC AAAACAGGAG GCACATTTTC 6060 

CCCACCTGTG TAGGTTCCAA AATATCTAGT GTTTTCATTT TTACTTGGAT CAGGAACCCA 6120 

GCACTCCACT GGATAAGCAT TATCCTTATC CAAAACAGCC TTGTGGTCAG TGTTCATCTG 6180 

CTGACTGTCA ACTGTAGCAT TTTTTGGGGT TACAGTTTGA GCAGGATATT TGGTCCTGTA 6240 

GTTTGCTAAC ACACCCTGCA GCTCCAAAGG TTCCCCACCA ACAGCAAAAA AATGAAAATT 6300 

TGACCCTTGA ATGGGTTTTC CAGCACCATT TTCATGAGTT TTTTGTGTCC CTGAATGCAA 6360 

GTTTAACATA GCAGTTACCC CAATAACCTC AGTTTTAACA GTAACAGCTT CCCACATCAA 6420 

AATATTTCCA CAGGTTAAGT CCTCATTTAA ATTAGGCAAA GGAA 6464 
(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
TGASTCA 7 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 
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(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
TGGNNNNNNN GCCCAA 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH,: 5 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
TGGCA 

(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29 
TGACACA 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE : NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30 
TGAGTCA 

(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI-SENSE : NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31 
TGANACA 

(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI- SENSE: NO 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32 
TGATACA 

(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 8 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
(iii) HYPOTHETICAL: NO 
(iv) ANTI -SENSE: NO 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33 
CCNTGTNT 
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WE CLAIM: 

1. A method for quantifying the amount of transforming 
growth factor-£ (TGF-fi) in a liquid sample, which method 

5 comprises: 

(a) incubating said liquid sample together with 
eucaryotic cells that contain a TGF-S responsive expression 
vector having a gene encoding luciferase for a predetermined 
time period sufficient for said eucaryotic cells to express a 

10 detectable amount of said luciferase; 

(b) measuring the amount of said luciferase 
expressed during said time period; and 

(c) determining the amount of TGF-£ present in 
said sample by comparing the measured amount of said luciferase 

15 * against a reference curve. 

2 . The method in accordance with claim 1 wherein the 
reference curve represents a series of measured amounts of said 
luciferase produced from a series of known concentrations of 
TGF-S by said eucaryotic cells. 

20 3. The method in accordance with claim 1 wherein said 

eucaryotic cells are mammalian cells. 

4 . The method in accordance with claim 3 wherein said 
mammalian cells are members of the group consisting of mink 
lung epithelial cells, HeLa cells, Chinese hamster ovary cells, 

25 Hep3B cells, GM7373 cells, and NIH 3T3 cells. 

5. The method in accordance with claim 1 wherein the 
TGF-S responsive expression vector is a plasmid comprising, in 
the direction of transcription, a regulatory region that 
includes at least one TGF-S inducible response element that is 

30 operatively linked to a promoter, and a structural region 
downstream of said promoter, said response element being 
capable of inducing dose-dependent luciferase activity and said 
structural region coding for said luciferase. 

6 . The method in accordance with claim 5 wherein said 
35 plasmid includes a nucleotide sequence that corresponds to a 
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seguence selected from the group consisting of SEQ ID NOs 1-10. 

7. The method in accordance with claim 5 wherein said 
plasmid has the identifying characteristics of a plasmid 
selected from the group consisting of plasmid ATCC Accession 

5 Number 75627, plasmid ATCC Accession Number 74628 and plasmid 
ATCC Accession Number 75629. 

8. -The method in accordance with claim 5 wherein said 
TGF-S inducible response element comprises a nucleotide 
sequence that corresponds to a sequence selected from the group 

10 consisting of SEQ ID NOs 11-17. 

9. The method in accordance with claim 5 wherein said 
promoter comprises a nucleotide sequence that corresponds to a 
sequence selected from the group consisting of SEQ ID NOs 18 
and 19, 

15 io. The method in accordance with claim 1 wherein said 

eucaryotic cells are stably transformed cells that contain said 
TGF-S responsive vector, and wherein said vector also includes 
a gene encoding a selectable marker. 

11. The method in accordance with claim 10 wherein said 
20 vector is a plasmid comprising a nucleotide sequence that 

corresponds to a sequence selected from the group consisting of 
SEQ ID NOs 1-6. 

12 . The method in accordance with claim 1 wherein said 
eucaryotic cells are transiently transformed cells that contain 

25 said TGF-S responsive vector, and wherein said vector is a 

plasmid comprising a nucleotide sequence that corresponds to a 
sequence selected from the group consisting of SEQ ID NOs 7-10. 

13 . The method in accordance with claim 1 wherein said 
liquid sample is selected from the group consisting of a body 
30 fluid, culture medium and a tissue extract. 

14. A method for quantifying the amount of transforming 
growth factor-6 (TGF-S) in a liquid sample comprising: 

(a) providing, in eucaryotic cells capable of 
expressing an indicator molecule, a plasmid comprising, in the 
35 direction of transcription, a regulatory region that includes 
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at least one TGF-E inducible response element that is 
operatively linked to a promoter, and a structural region 
downstream of said promoter, said response element being 
capable of inducing dose -dependent indicator molecule activity 
5 and said structural region coding for said indicator molecule; 

(b) incubating said liquid sample with said 
eucaryotic cells for a predetermined time period sufficient for 
said eucaryotic cells to express a detectable amount of said 
indicator molecule; 

10 (c) measuring the amount of said indicator 

molecule expressed during said time period; and 

(d) comparing the measured amount of said 
indicator molecule produced in step (c) with the amount of 
indicator molecule produced in a control assay performed 

15 according to steps (a) through (c) by treating said liquid 
sample with an anti-TGF-g antibody to obtain a net measured 
amount of said indicator molecule induced by said TGF-fe. 

15. The method in accordance with claim 14 wherein said 
liquid sample contains an isoform of TGF-B selected from the 

20 group consisting of TGF-S1, TGF-S2 and TGF-E3 . 

16. The method in accordance with claim 14 wherein said 
liquid sample is selected from the group consisting of a body 
fluid, culture medium and a tissue extract. 17. The method 
in accordance with claim 14 wherein said eucaryotic cell is a 

25 mammalian cell. 

18. The method in accordance with claim 14 wherein said 
mammalian cell is selected from the group consisting of mink 
lung epithelial cells, HeLa cells, Chinese Hamster Ovary cells, 
Hep3B cells, GM7373 cells and NIH 3T3 cells. 

30 19. The method in accordance with claim 14 wherein said 

indicator molecule is lucif erase. 

20. The method in accordance with claim 14 wherein said 
plasmid comprises a nucleotide sequence that corresponds to a 
sequence selected from the group consisting of SEQ ID NOs 1-10, 

35 21. The method in accordance with claim 14 wherein said 
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TGF-E inducible response element comprises a nucleotide 
sequence that corresponds to a sequence selected from the group 
consisting of SEQ ID NOs 11-17. 

22. The method in accordance with claim 14 wherein said 
5 promoter comprises a nucleotide sequence that corresponds to a 
sequence selected from the group consisting of SEQ ID NOs 18 
and 19. 

23 . The method in accordance with claim 14 wherein said 
plasmid has the identifying characteristics of a plasmid 
10 selected from the group consisting of plasmid ATCC Accession 
Number 75627, plasmid ATCC Accession Number 74628 and plasmid 
ATCC Accession Number 75629. 

24. The method in accordance with claim 14 wherein said 
eucaryotic cells are stably transformed cells that contain said 

15 plasmid, and wherein said plasmid contains a gene encoding a 

selectable marker for the selection of said stably transformed 
cells . 

25. The method in accordance with claim 24 wherein said 
plasmid comprises a nucleotide sequence that corresponds to a 

20 sequence selected from the group consisting of SEQ ID NOs 1-6. 

26. The method in accordance with claim 14 wherein said 
eucaryotic cells are stably transformed cells that contain the 
TGF-fi response element having the nucleotide sequence in SEQ ID 
NO 11, and wherein said cells correspond to cells on deposit 

25 with ATCC having the ATCC Accession Number CRL 11508. 

27 . The method in accordance with claim 14 wherein 
eucaryotic cells comprise transiently transformed cells that 
contain said plasmid comprising a nucleotide sequence that 
corresponds to a sequence selected from the group consisting of 
30 SEQ ID NOs 7-10. 

28. The method in accordance with claim 14 further 
comprising the step of: 

(e) determining the amount of said TGF-E present in 
said sample by comparing the measured amount of said indicator 
35 molecule obtained in step (d) against a reference curve. 
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29. The method in accordance with claim 28 wherein said 
reference curve represents a series of measured amounts of said 
indicator molecule produced from a series of known 
concentrations of TGF-S in said eucaryotic cells. 
5 30. A plasmid vector in substantially pure form capable * 

of causing expression of an indicator molecule in a eucaryotic 
cell, said plasmid including in the direction of transcription, 
a first nucleotide sequence comprising a regulatory region that 
includes at least one TGF-S inducible response element 

10 operatively linked to a promoter, a second nucleotide sequence 
comprising a structural region downstream of said promoter and 
coding for said indicator molecule, and a third nucleotide 
sequence comprising a gene encoding a selectable marker for the 
selection of a stably transformed cell, said response element 

15 being capable of inducing dose-dependent lucif erase activity 
and said structural region coding for said lucif erase. 

31. The plasmid vector in accordance with claim 30 
capable of expressing a chemiluminescent indicator molecule. 

32. The plasmid vector in accordance with claim 3 0 
20 wherein said plasmid comprises a nucleotide sequence that 

corresponds to a sequence selected from the group consisting of 
SEQ ID NOs 1-6. 

33. The plasmid vector in accordance with claim 3 0 
wherein said TGF-S inducible response element comprises a 

25 nucleotide sequence that corresponds to a sequence selected 
from the group consisting of SEQ ID NOs 11-17. 

34. The plasmid vector in accordance with claim 30 
wherein said promoter conprises a nucleotide sequence that 
corresponds to a sequence selected from the group consisting of 

30 SEQ ID NOs 18 and 19. 

35. The plasmid vector in accordance with claim 30 
wherein said gene comprises the nucleotide sequence in SEQ ID 
NO 20. 

36. A plasmid vector in substantially pure form and 
35 capable of causing expression of luciferase in a eucaryotic 
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cell, said plasmid comprising in the direction of 
transcription, a regulatory region that includes at least one 
TGF-S inducible response element that is operatively linked to 
a promoter, and a structural region downstream of said promoter 
for transcription therefrom and coding for said luciferase, 
said response element being capable of inducing dose-dependent 
luciferase activity and said structural region coding for said 
luciferase, and wherein said plasmid has the identifying 
characteristics of a plasmid selected from the group consisting 
of plasmid ATCC Accession Number 75627, plasmid ATCC Accession 
Number 74628 and plasmid ATCC Accession Number 75629. 

37. A plasmid vector in substantially pure form and 
capable of causing expression of luciferase in a eucaryotic 
cell, said plasmid comprising in the direction of 
5 transcription, a regulatory region that includes at least one 
TGF-fc inducible response element that is operatively linked to 
a promoter, and a structural region downstream of said promoter 
for transcription therefrom and coding for said luciferase, 
said response element being capable of inducing dose-dependent 
0 luciferase activity and said structural region coding for said 
luciferase, and wherein said plasmid comprises a nucleotide 
sequence that corresponds to a sequence selected from the group 
consisting of SEQ ID Nos 7-10. 

38. A eucaryotic cell containing a plasmid vector having 
>5 a nucleotide sequence that corresponds to a sequence selected 

from the group consisting of SEQ ID NOs 1-10. 

39. The eucaryotic cell in accordance with claim 38 
wherein said cell is selected from the group consisting of mink 
lung epithelial cells, HeLa cells, Chinese hamster ovary cells, 

30 Hep3B cells, GM7373 cells and NIH 3T3 cells. 

40. A kit useful in assaying the amount of TGF-S in a 
liquid sample comprising (a) packaging material; (b) eucaryotic 
cells contained within said packaging material, said cells 
capable of expressing an indicator molecule and containing a 

35 plasmid comprising, in the direction of transcription, a 
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regulatory region that includes at least one TGF-E inducible 
response element that is operatively linked to a promoter, and 
a structural region downstream of said promoter, said response 
element being capable of inducing dose -dependent indicator 
molecule activity and said structural region coding for said 
indicator molecule; and (c) an aliquot of TGF-E contained 
within said packaging material, said TGF-fi used for generating 
a reference curve representing a measured amount of the 
indicator molecule produced from a known concentration of TGF- 

41. The kit in accordance with claim 40 wherein said 
eucaryotic cells are selected from the group consisting of mink 
lung epithelial cells, HeLa cells, Chinese Hamster Ovary cells, 
Hep3B cells, GM7373 cells and NIH 3T3 cells. 

42. The kit in accordance with claim 40 wherein said 
plasmid comprises a nucleotide sequence that corresponds to a 
sequence selected from the group consisting of SEQ ID NOs 1-10. 

43. The kit in accordance with claim 40 wherein said 
plasmid comprises a plasmid having the identifying 
characteristics of a plasmid selected from the group consisting 
of plasmid ATCC Accession Number 75627, plasmid ATCC Accession 
Number 74628 and plasmid ATCC Accession Number 75629. 

44. The kit in accordance with claim 40 wherein said 
packaging material comprises a label indicating that said 
eucaryotic cells can be used for determining the amount of TGF- 
S in said liquid sample comprising the steps of (a) incubating 
said cells with said liquid sample; (b) measuring the amount of 
said indicator molecule produced thereby; and (c) comparing the 
amount of measured indicator molecule with said reference 
curve . 

45. The kit in accordance with claim 40 wherein said 
eucaryotic cells are stably transformed cells that contain the 
TGF-S response element having the nucleotide sequence in SEQ ID 
NO 11, and wherein said cells correspond to cells on deposit 
with ATCC having the ATCC Accession Number CRL 11508. 
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46. The kit in accordance with claim 40 further 
comprising: (d) an anti-TGF-fi antibody for use in a parallel 
control assay for determining the amount of indicator molecule 
produced other than by TGF-fe induction. 
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